Skip to content

Implemented surface

This page is the reverse roadmap: a detailed inventory of behavior that is currently wired, where it is exposed, and the conditions that decide whether an idea is actually done.

Use it when a feature sounds implemented but you need to answer a sharper question: which path is wired, which clients expose it, what blocks it, and what is still missing.

Status meanings:

StatusMeaning
ImplementedThe main path is wired and has tests or direct operational surface.
PartialA useful path exists, but important surfaces, limits, or polish remain.
PlannedThe docs or design mention it, but users should not depend on it yet.
Out of scopeThe behavior is intentionally not planned.

See also: core model and partition routing.

ItemStatusImplemented surface
Topic plus optional groupImplementedBroker, protocol, Rust client, TypeScript client, Python client, admin API, CLI
Default group normalizationImplementedEmpty group and default normalize to ungrouped on admin and CLI paths, and in Rust, TypeScript, and Python clients
Partitioned queue declarationImplementedDeclareQueue.partition_count, Rust client declare builder, config default, coordination catalogue
Producer partition routingImplementedRust client topology cache, round-robin keyless routing, partition_key stable routing, version-fenced publish frames
Subscription fan-inImplementedRust client opens per-partition subscriptions from topology and merges deliveries into one logical stream
Cluster catalogue (client)ImplementedRust, TypeScript, and Python clients expose the live set of declared queues and streams (with partition counts) via a snapshot accessor and a change-subscription, derived from topology and kept live by topology pushes (no extra round-trips)
Pattern subscribe / discovery routing (client)ImplementedOpt-in client.routing() returns a routing view. subscribe_pattern (queues) and subscribe_stream_pattern (streams) fan in across every channel whose topic matches a *-glob and auto-attach channels that start matching later (driven by the catalogue feed). Manual and auto-ack variants. Rust, TypeScript, and Python. Client-side only, no broker changes
Operator-chosen partition idOut of scopeNormal user-facing paths should choose queue and optional key, not a partition number

Conditions and limits:

  • A queue is addressed by topic plus optional group.
  • Groups are namespaces, not consumer groups. They do not coordinate competing consumer membership by themselves.
  • Declared queues can have more than one partition.
  • Partition selection stays inside Fibril. Producers may provide a partition key for stable routing, or omit it for round-robin spread.
  • If topology is unknown or standalone, clients conservatively use partition 0.
  • Partition count is fixed at creation in standalone mode. In Ganglion mode an experimental live-repartition path can grow or shrink a queue’s partition count (see the experimental cluster surface below).

See also: reliability semantics and many idle queues.

ItemStatusImplemented surface
Append-only message logImplementedStorage layer and broker publish path
Append-only event logImplementedStorage state changes and recovery
SnapshotsImplementedStorage recovery and periodic snapshot path
Lazy startup indexingImplementedExisting queues can be indexed without materializing all queue state at startup
Queue recovery on first useImplementedFirst active operation can materialize and recover an indexed queue

Conditions and limits:

  • Durable messages and queue state live on disk.
  • A queue can exist on disk without being loaded into memory.
  • Loading a cold queue has a first-use cost.
  • Single-node queue deletion is exposed through the admin API/dashboard. A coordinated multi-node delete is still pending.
  • Message TTL (dropping individual messages by age) is implemented. Log retention by age (truncating old durable messages on a schedule) is not yet a user-facing feature.

See also: recovery quarantine.

ItemStatusImplemented surface
Recovery reference verificationImplementedRecovery checks each replayed event’s referenced message offset against the message log’s durable tail, and decodes every event record
recovery.on_mismatch policyImplementedStartup config: quarantine (default), refuse, or ignore
Per-partition quarantineImplementedA bad partition is parked (its ops error) while the rest of the broker stays up
Operator repairImplementedAdmin quarantine banner + /admin/api/quarantine/repair truncate-to-valid, follower re-fetches the dropped suffix on next catch-up
Readiness healthImplemented/readyz reflects quarantine state and the configured policy
Quarantine metricImplementedrecovery.quarantined gauge and quarantines_total counter in the recovery snapshot

Conditions and limits:

  • A dangling event reference is only possible as a lost-tail suffix (events are written after their messages), so truncate-to-valid is a complete repair.
  • A corrupt event record is the genuine mid-log failure and uses the same quarantine and truncate machinery.
  • refuse is lazy today (a mismatch is caught when the partition is first used); an eager whole-disk variant at boot is a tracked follow-up.

See also: client usage and retries and delays.

ItemStatusImplemented surface
Unconfirmed publishImplementedTCP protocol, Rust client, TypeScript client, Python client
Confirmed publishImplementedTCP protocol, Rust client, TypeScript client, Python client
Pipelined confirmation handlesImplementedRust publish_with_confirmation, TypeScript publishWithConfirmation, Python publish_with_confirmation
Delayed publishImplementedTCP protocol, broker, Rust client, TypeScript client, Python client
Content type metadataImplementedProtocol metadata, Rust client, TypeScript client, Python client, delivery path
Reserved metadata headersImplementedBroker protocol handler rejects fibril.* and stroma.* user headers
Partition key routingImplementedRust client NewMessage::partition_key, protocol publish metadata, server per-partition publish routing
Partitioning-version fenceImplementedClient stamps routed version, and server redirects stale publishes before appending
Message TTL (drop by age)ImplementedPublish.ttl_ms + per-queue default_message_ttl_ms on declare, owner resolves an absolute deadline, expiry worker drops via the DLQ/discard pipeline. Rust Publisher::expiring + QueueConfig::default_message_ttl, TypeScript Publisher.expiring + QueueConfig.defaultMessageTtl, Python Publisher.expiring + QueueConfig.default_message_ttl (seconds-native)

Conditions and limits:

  • Confirmed publish returns the broker-assigned offset.
  • Unconfirmed client calls only wait for the local client engine or command path, not for a broker-assigned offset.
  • Delayed publish uses a distinct delayed-publish frame and a not_before deadline.
  • Content type is stored outside the user header map for common cases.
  • Manual content-type headers are interpreted as content type metadata by clients.
  • Broker-side validation rejects reserved system header prefixes on normal and delayed publish.
  • A partition key affects only partition selection. It is not a RabbitMQ-style routing key and is not part of the durable payload.
  • Stale partitioning topology is handled by redirecting the client to refresh and retry.
  • Message TTL drops a message that is not consumed before its deadline. A per-message ttl_ms wins over the queue’s default_message_ttl_ms; with neither set a message never expires. The owner resolves the deadline against its own clock at publish, so it survives recovery and replication. An expired message is never dropped while it is in flight, and the drop honors the queue’s dead-letter policy (discard when no DLQ is configured, otherwise dead-lettered with reason expired). This is per-message age-drop, not queue expiration (auto-deleting an idle queue), which is a separate, not-yet-implemented idea.

See also: backpressure and client usage.

ItemStatusImplemented surface
Manual ack subscriptionsImplementedTCP protocol, Rust client, TypeScript client, Python client
Auto ack subscriptionsImplementedTCP protocol, Rust client, TypeScript client, Python client
Bounded prefetchImplementedBroker delivery path and clients
BackpressureImplementedPull-based delivery bounded by prefetch
Unsubscribe redistributionImplementedBroker tests cover prefetched unacked messages returning to active subscribers
Competing consumers (default)ImplementedMany consumers per queue, fair dispatch, unordered
Exclusive consumer groupsPartial.exclusive()/consumer_target (Rust), consumerGroup()/consumerTarget() (TypeScript), and consumer_group()/consumer_target() (Python) + TCP protocol, per-partition gate, balanced+sticky assignment, soft consumer_target, assignment push, reconnect restore, one-cohort-per-queue guard
Cross-broker cohort coordinationPartialMember identity + controller (aggregate→plan→publish) + owner apply all wired in cluster bootstrap, coordination-level multi-node rebalance test exists, fuller broker/client scenarios are still growing
Partition fan-inImplementedBoth clients subscribe to all known partitions, merge deliveries while keeping per-partition settlement routing, and pick up partitions added by a live grow

See consumer groups for the user-facing model.

Conditions and limits:

  • Manual ack messages must be completed, failed, retried, or retried after a delay.
  • Auto ack happens only after the delivery frame is successfully sent.
  • Prefetch limits how many messages a subscription can hold at once.
  • If a subscription ends with prefetched but unsettled messages, those messages are returned for redelivery.
  • A message can be delivered more than once under failure, retry, or lease expiry conditions.
  • Exclusive consumer groups are opt-in (Rust .exclusive(), TypeScript .consumerGroup(), Python .consumer_group()). Without them, consumers compete (no ordering). A queue has a single exclusive cohort. The clients expose the assignment-events stream (Rust assignment_events(), TypeScript onAssignmentChange, Python on_assignment_change).
  • Cross-broker cohort balance is advisory/eventually-consistent. The per-partition delivery gate is always the correctness backstop. Single-node is fully covered by tests.
  • Plain subscriptions fan in over known partitions and pick up partitions added by a live grow. A topology warm step at connect prevents pure consumers from staying on partition 0 when topology is available.

See also: Plexus streams and client usage.

ItemStatusImplemented surface
Stream channel type (fan-out)ImplementedStroma StreamEngine (cursors/retention), broker fan-out actor, TCP protocol (DeclarePlexus/SubscribeStream, reuses Publish/Deliver/Ack), Rust + TypeScript + Python clients
Declare plexus (partitions, durability, retention, replication factor)Implementeddeclare_plexus/declarePlexus + StreamConfig in all three clients, including a per-stream replication-factor override that beats the cluster default
Durable named cursorImplementedBroker-side cursor per (channel, partition, name), resuming on restart and advancing on ack
Ephemeral start positionImplementedlatest / earliest / offset / n-back / by-time
Header filterImplementedAND of header == pattern with * glob, stream-only
Client-side fan-in across partitionsImplementedReuses the queue fan-in supervisor (failover resubscribe + live-grow pickup). Streams stay out of the reconnect-reconcile registry and resume via the cursor
Durability tiers (ephemeral/speculative/durable)ImplementedExpress lane wired: durable fsyncs before deliver/confirm, speculative delivers off the staged offset and defers the confirm until durable (with a fibril.speculative header), ephemeral delivers and confirms at staging with no fsync (AfterWrite). All log-backed
Cross-client wire vectorsImplementedDeclarePlexus/DeclarePlexusOk/SubscribeStream pinned in clients/wire_vectors.json, asserted by Rust/Python/TS

Conditions and limits:

  • Every consumer of a stream sees every record. Partitioning a stream is for write throughput and per-key ordering, not consumer work-sharing.
  • A durable name is single-active (last commit wins). Distinct names are independent fan-out consumers, each with its own per-partition cursor.
  • A fresh durable name starts at the earliest retained record so it cannot silently miss data.
  • Retention drops whole sealed segments (by age/bytes/records) and clamps a cursor that lags past the retained window.
  • The durability tier changes delivery and confirm timing end to end: speculative and ephemeral deliver off the staged offset (no fsync wait), with speculative deferring the confirm until durable and ephemeral confirming immediately.

See also: reconnects and reconnection grace internals.

ItemStatusImplemented surface
Resume identity handshakeImplementedTCP protocol, Rust client, TypeScript client, Python client
Reconnect grace windowImplementedRuntime settings and TCP handler. On by default in the server seed (connection.reconnect_grace_ms, 5s); opt out with 0. Both manual and auto-ack subscriptions participate
Conservative subscription reconciliationImplementedBroker, Rust client, TypeScript client, Python client
Restore-client-subscriptions policyImplementedBroker, Rust client, TypeScript client, Python client
Reconnect observabilityImplementedAdmin overview, TCP metrics log, structured reconciliation logs
Planned restart drain noticePartialPOST /admin/api/drain broadcasts a GoingAway push (grace deadline + message) to connected clients. All three clients surface it as an app-observable event (going_away_events / onGoingAway / on_going_away) so the app can wind down, then rely on reconnect-on-close for the redirect. Broker-side graceful ownership handoff on drain (for true zero-downtime) is pending
Durable broker restart reconciliationPlannedDesign notes only
In-flight publish replayOut of scopeClients do not replay old in-flight protocol requests

Conditions and limits:

  • Reconnect grace only applies while the broker process stays alive.
  • Resume requires the same client resume identity before grace expires.
  • The conservative policy keeps matching subscriptions, drops server-only subscriptions, and closes client-side streams that the broker cannot prove are still valid.
  • Restore mode can recreate missing server-side subscriptions reported by the client after a successful resume.
  • Current Rust, TypeScript, and Python subscription receive APIs surface reconciliation-closed streams as end-of-stream rather than a typed close reason.

See also: core model, reliability semantics, and retries and delays.

ItemStatusImplemented surface
AckImplementedTCP protocol, Rust client, TypeScript client, Python client
Nack without requeueImplementedTCP protocol, Rust client, TypeScript client, Python client
Immediate retryImplementedTCP protocol, Rust client, TypeScript client, Python client
Delayed retryImplementedTCP protocol, broker, Rust client, TypeScript client, Python client
Lease expiryImplementedRuntime delivery settings and broker/storage path

Conditions and limits:

  • Ack settles a delivered message.
  • Fail means nack without requeue. Depending on queue policy, the message may be discarded or dead-lettered.
  • Retry means nack with requeue.
  • Delayed retry requires requeue=true and a not_before deadline.
  • Expired leases can return inflight messages to ready.
  • Lease timing is controlled by runtime settings.

See also: dead lettering and metadata policy.

ItemStatusImplemented surface
Per-queue DLQ policyImplementedRust client, TypeScript client, Python client, fibrilctl, admin API
Global DLQ targetImplementedStroma-owned runtime state, admin UI/API, fibrilctl
Max retry routingImplementedBroker/storage path and tests
Dead-letter reasonsImplementedretries_exhausted, terminal_nack, pending_recovery, expired (message TTL)
DLQ metadataImplementedReserved stroma.dlq.* headers on dead-lettered messages
DLQ replay by selected offsetsImplementedBroker/storage path, admin API, admin dashboard, fibrilctl
Bulk replay filtersPlannedNot yet implemented
Delete or ack DLQ items from replayPlannedReplay copies back to source and leaves the DLQ message in place

Conditions and limits:

  • Queue policy can discard, use the global DLQ target, or use a custom queue-specific target.
  • The global target is live persisted runtime state, not startup config.
  • Replay requires active DLQ messages with source metadata.
  • Replay strips system metadata from the replayed copy.
  • Replay skips offsets that are missing, inactive, or missing required DLQ source metadata.
  • Replay currently accepts at most 100 offsets per request through the admin API.

See also: admin dashboard and dead lettering.

ItemStatusImplemented surface
Inspect active queue messagesImplementedAdmin API, admin dashboard, fibrilctl
Include settled offsetsImplementedAdmin API, admin dashboard, fibrilctl
Payload previewsImplementedAdmin API and dashboard, base64 encoded
Status filteringImplementedAdmin API and dashboard
Paginated offset navigationImplementedAdmin dashboard and API offset parameters
High-volume live polling viewOut of scopeUse metrics and logs instead

Conditions and limits:

  • By default, inspection returns messages still active in queue state.
  • Active states include ready, inflight, delayed, and pending DLQ.
  • Settled records can be included explicitly.
  • Offsets without matching inspectable state or log records are skipped rather than shown as misleading partial rows.
  • Inspecting a queue can load that queue into memory. If idle cleanup is enabled and no active publisher or subscriber exists, cleanup can unload it again.
  • Default page size is 50.
  • Current API hard cap is 5000 messages per request.
  • Payload previews default to 4096 bytes and cap at 1048576 bytes.
  • Inspection reads persisted data and can affect realtime performance on large requests.

See also: many idle queues and idle queue internals.

ItemStatusImplemented surface
Lazy loadingImplementedStorage and broker startup behavior
Idle queue cleanupImplementedRuntime settings, broker worker, Stroma unmaterialization
Publisher idle expiryImplementedRuntime settings and broker publisher cache cleanup on incoming connection frames
Cleanup race guardImplementedBroker prevents cleanup from racing with newly created publisher/subscriber leases
Active lease materializationImplementedPublisher and subscriber creation materialize storage before returning a usable handle
Inspection reload cleanupImplementedQueues loaded by admin inspection can be unloaded again by idle cleanup
Cleanup observabilityPartialAdmin queues page, /admin/api/queues_debug, cumulative metrics counters
Exact cleanup timingOut of scopeCleanup is approximate and sweep based

Conditions for a queue to be unloaded from memory:

  • idle queue cleanup is enabled
  • the broker knows the queue as a cleanup candidate
  • no active subscribers remain
  • no active publishers remain
  • no messages are currently leased to consumers
  • no pending settlement work is still draining
  • no broker delivery tags remain active
  • storage reports no inflight messages
  • the configured idle window has elapsed
  • the cleanup sweep reaches that queue
  • storage accepts the unmaterialization attempt
  • no publisher or subscriber lease is being created at the same time as cleanup

Conditions that keep a queue in memory or skip cleanup:

  • active publisher or subscriber
  • a new publisher or subscriber lease arriving while cleanup is trying to unload the queue
  • not idle long enough
  • pending settlements
  • broker-tracked deliveries still active
  • storage-tracked inflight messages
  • storage race while another operation is materializing or changing the queue
  • queue not tracked by the broker for cleanup

Operator-facing behavior:

  • Unloading is not deletion.
  • Durable messages, events, snapshots, and queue identity stay on disk.
  • A later publish, subscribe, or admin operation can reload the queue.
  • Creating a publisher or subscriber loads the queue before the handle is returned.
  • Admin message inspection can load a queue without creating a publisher or subscriber lease.
  • The first operation on a cold queue can pay reload cost.
  • Long-lived producer connections can keep queues active unless publisher idle expiry is enabled.
  • Automated cleanup does not keep rechecking storage after a queue is already unloaded unless new queue activity happens.

Observability currently includes:

  • loaded versus indexed-only queue state on the admin queues page
  • active publisher and subscriber counts
  • idle time and last-used time when known by the current process
  • last cleanup result or skip reason per tracked queue
  • cumulative cleanup attempts and selected outcomes in broker metrics snapshots

Developer-facing note:

  • Broker PublisherHandle is intentionally not cloneable. A broker publisher handle owns one sink task and one active-publisher lease. Creating another independently tracked publisher must go through get_publisher, which creates another sink and another lease.

See also: configuration, configuration policy, and configuration design.

ItemStatusImplemented surface
TOML startup configImplementedConfig crate and server binary
Env and CLI overridesImplementedConfig crate and server binary
Admin auth startup configImplementedTOML, env, CLI, server wiring
Keratin fsync and segment startup configImplementedConfig crate and server wiring
Runtime delivery settingsImplementedRuntime settings manager and admin UI/API
Runtime idle cleanup settingsImplementedRuntime settings manager, admin UI/API, broker worker
Runtime locksImplementedLocked groups reject admin edits
Global DLQ runtime stateImplementedStroma-owned versioned setting
Corrupt runtime settings recovery UIPartialLoad issue reporting exists, richer operator reset flow is pending

Conditions and limits:

  • Startup config is loaded before the process starts serving.
  • Startup precedence is defaults, TOML, environment, then CLI.
  • Runtime seeds initialize persisted runtime settings only when no runtime settings exist.
  • After runtime settings exist, persisted state owns those values unless a group is locked by startup config.
  • Runtime updates use expected versions to detect concurrent edits.
  • Locked runtime groups reject admin edits instead of silently applying changes.

See also: admin dashboard.

ItemStatusImplemented surface
Overview metricsImplementedDashboard and API
Connections and subscriptionsImplementedDashboard and API
Queues pageImplementedDashboard and API, per-partition expand, follower-replication view, DLQ-policy column, hide-inactive toggle + search filter
Create queueImplementedPOST /admin/api/queues + dashboard form (partition count, optional DLQ policy, optional default message TTL)
Delete queue (single-node)ImplementedPOST /admin/api/queues/delete + per-row dashboard button; refuses while messages are inflight (409) and in cluster mode (501) pending coordinated teardown
Streams pageImplementedDashboard and API (GET /admin/api/streams), per-topic partition rows with head/tail/retained, declared durability and retention, plus per-partition role and applied offset and a follower-replication view (GET /admin/api/streams_debug)
Create streamImplementedPOST /admin/api/streams + dashboard form (partition count, durability tier, optional retention by records/bytes/age)
Settings pageImplementedDashboard and API, incl. replication and streaming-replication settings
Message inspection pageImplementedDashboard and API
DLQ replay controlsImplementedDashboard and API
Topology pageImplementedCoordination nodes, per-partition ownership/epochs, consensus block
Repartition controlPartial/admin/api/repartition and a topology-page form (Ganglion mode)
Coordination membership controlPartialAdd/remove a consensus voting member from the topology page and API
Cohort visibilityPartialPer-broker exclusive-cohort membership on the subscriptions page and /admin/api/cohorts
Quarantine banner and repairImplementedGlobal banner, /admin/api/quarantine, repair endpoint, /readyz
Basic admin authImplementedLogin, logout, session cookie, auth-disabled mode
Fine-grained admin rolesPlannedCurrent model is admin access or auth disabled

Conditions and limits:

  • When admin auth is enabled, dashboard pages require login.
  • When admin auth is disabled, dashboard pages are accessible directly and should be protected by network boundaries.
  • Settings updates use version checks.
  • The dashboard is for operational inspection, not continuous high-frequency monitoring.

See also: source deployment and dead lettering.

ItemStatusImplemented command
Queue declarationImplementedfibrilctl queue declare
Global DLQ get, set, clearImplementedfibrilctl admin global-dlq
Message inspectionImplementedfibrilctl admin messages
DLQ replayImplementedfibrilctl admin dlq replay
Queue observabilityImplementedfibrilctl admin queues
Pub or sub from CLIPlannedUseful for manual testing, not currently implemented

Conditions and limits:

  • CLI uses broker TCP for queue declaration.
  • CLI uses admin HTTP for admin commands.
  • By default it reads the same startup config path handling as the server config crate.
  • Container usage assumes the CLI runs where it can reach the broker or admin surface.

See also: client usage, quickstart, and reconnection grace.

Server-side reconnect grace for inflight settles is implemented in the TCP handler when connection.reconnect_grace_ms is configured. Clients now attempt one automatic reconnect before a new operation when the old engine is already closed. Clients and broker also exchange subscription metadata after a successful resume. Active subscription streams continue when reconciliation confirms that the subscription should be kept. An opt-in restore policy can ask the broker to recreate subscriptions that the client still owns but the server does not currently have.

ItemRustTypeScript
Connect and authImplementedImplemented
Publish unconfirmedImplementedImplemented
Publish confirmedImplementedImplemented
Pipelined confirmation handleImplementedImplemented
Delayed publishImplementedImplemented
Message TTL (expiring publisher + queue default_message_ttl)ImplementedImplemented
Manual ack subscriptionImplementedImplemented
Auto ack subscriptionImplementedImplemented
Plexus stream declare + subscribe (durable cursor, filter, fan-in)ImplementedImplemented
Exclusive consumer groupImplementedImplemented
Assignment-change eventsImplementedImplemented (onAssignmentChange)
Resume identity handshakeImplementedImplemented
Explicit reconnect outcomeImplementedImplemented
Existing publishers after explicit reconnectImplementedImplemented
New subscriptions after explicit reconnectImplementedImplemented
Conservative automatic reconnect before new operationImplementedImplemented
Subscription reconciliation metadata exchangeImplementedImplemented
Active subscription recovery after accepted resumeImplementedImplemented
Opt-in subscription restore after accepted resumeImplementedImplemented
Delayed retryImplementedImplemented
Queue declarationImplementedImplemented
Content type helpersImplementedImplemented
Default group normalizationImplementedImplemented
Automatic reconnect and resubscribePlannedPlanned

Conditions and limits:

  • Both clients expose msgpack, JSON, text, raw payloads, content type metadata, and custom user headers.
  • Both clients treat content type separately from the header map.
  • Both clients expose delayed publish and delayed retry.
  • TypeScript uses bigint for protocol u64 values such as offsets.
  • Explicit reconnect returns whether the broker accepted the resume identity.
  • Publisher handles use the latest client engine after explicit or automatic reconnect.
  • New subscriptions created after reconnect use the latest client engine.
  • Automatic reconnect is bounded by client policy and defaults to one attempt before a new operation.
  • After a successful resume, both clients send known subscription metadata and read the broker reconciliation result.
  • When the broker returns keep, both clients route later deliveries for that subscription into the existing stream.
  • If the broker keeps a subscription under a different server sub_id, both clients remap the existing stream to that server id.
  • The default reconciliation policy is conservative. Client-only or mismatched subscriptions are closed client-side, while server-only subscriptions are dropped by the broker.
  • The opt-in restore-client-subscriptions policy recreates client-owned subscriptions that are missing server-side, then keeps the existing client stream using the broker’s new subscription id.
  • Operations already in flight when the socket fails are not replayed.
  • Active subscriptions still need application-level handling when resume is rejected or reconciliation reports a mismatch.
  • Late settlements after a short disconnect are accepted only when the client explicitly resumes before grace expires.
  • Broker process restart reconciliation is not implemented. Current reconnect grace depends on the broker process keeping dormant connection state in memory.

See also: benchmarks.

ItemStatusImplemented surface
Throughput benchmarkImplementedRust bench helper and shell scripts
Steady-state rate benchmarkImplementedBench binary and scripts
Memory sampling during benchImplementedBench scripts
Formal CI benchmark reportingPlannedLocal and informal docs exist, reproducible CI reporting is pending
One command pre-commit verification helperPlannedIndividual checks exist, unified helper is still pending

Conditions and limits:

  • Current benchmark numbers are local architecture checks.
  • They are not a stable published performance contract.
  • Hardware, storage, durability settings, payload size, queue depth, and batching strongly affect results.

See also: source deployment.

ItemStatusImplemented surface
Website Docker deploymentImplementedCompose file and Traefik labels
Broker server imageImplementedDockerfile and publish workflow
fibrilctl in imageImplementedServer image includes CLI
Source deployment docsImplementedDocs site
Full production hardening guidePartialBasic guidance exists, deeper ops runbook is pending

Conditions and limits:

  • Server image exposes broker TCP and admin HTTP ports.
  • Persistent broker data should be mounted under the configured data directory.
  • Admin auth should be enabled outside local protected environments.

Experimental Cluster and Replication Surface

Section titled “Experimental Cluster and Replication Surface”

See also: clustering and replication.

ItemStatusImplemented surface
Ganglion coordination modePartialStartup config, embedded coordinator, TCP transport, broker self-registration, topology endpoint
Broker advertise addressPartialbroker.listener.advertise / FIBRIL_BROKER_ADVERTISE (priority-ordered list, peer-derived default), carried to clients as owner_endpoints; clients dial the first entry (Rust high-level client connects to socket-address entries only)
Queue catalogue and placement controllerPartialDeclared queues register partitions, controller assigns owners and followers, placement is stable and anti-churn
Partition ownership gatePartialBroker serves only assigned owners in Ganglion mode. Standalone mode owns all queues
Follower pull replicationPartialFollower workers pull owner records over protocol, apply durably, install checkpoints when needed
Automatic failoverPartialDead owner can trigger epoch-bumped reassignment, follower promotion at local tail, stale owner demotion
Cold-restart orphan reconciliationPartialA partition reassigned away while a node was down is retained as inert on-disk cold storage after restart (ownership-gated serving means it is never served or materialized) and surfaced at startup. Reclaim of that disk is still manual
Epoch fencingImplementedRole transitions advance log epochs before serving or applying replicated batches
Replica-durable confirmsPartialOwner waits for durable follower progress according to assignment policy, with timeout and ISR floor
Durable stream replication (Plexus)PartialTier-gated: the durable tier replicates record + cursor logs to stream_replication_factor followers (express tiers stay owner-only), durable publishes confirm on replica durability, and a caught-up follower is promoted on owner failover. Reuses the queue follower-worker, confirm gate, and failover-candidate selection
min_in_sync_replicasImplementedRuntime setting, fail-fast publish refusal when healthy ISR is below floor
Live repartitioningPartialGrow or shrink a queue’s partition count in Ganglion mode (versioned routing, in-flight transition serialization, drain-and-retire on shrink); admin control + API
Topology visibilityPartialAdmin API/page (with repartition + coordination-membership controls) and fibrilctl topology. Cross-broker lag aggregation is pending
Live topology pushPartialBroker pushes a TopologyUpdate to each connection when that connection’s routing content changes (not on every coordination metadata bump); the Rust, TypeScript, and Python clients apply it to their routing cache and ack the generation
Repartition cutover fencingPartialThe controller fences a repartition’s finalize (retiring shrunk-away partitions and clearing the marker) on cluster-wide client adoption of the new routing, derived from topology acks, bounded by repartition_adoption_timeout_ms. Publish version-fencing remains the correctness backstop. See live routing and cutover
Cohort visibility (admin)PartialPer-broker exclusive-cohort membership on the subscriptions page and /admin/api/cohorts; cluster-wide cohort assignment is broker-local, not centrally committed
Multi-node cohort coordinator testPartialCoordination-level e2e covers cross-broker membership aggregation and rebalance. Full broker/client scenario coverage is still growing

Conditions and limits:

  • This surface is experimental on the replication/sharding branch.
  • Ganglion mode is the active embedded-coordination path. The older etcd-shaped plan remains useful as a design reference, but the current implementation uses the same coordination trait with Ganglion underneath.
  • Replication is follower-pull. Followers apply durably, then report progress through stamped replication reads.
  • Failover safety relies on assignment epochs plus local Stroma promotion gates.
  • Replica-durable confirms are meaningful only when the assignment durability policy requires more than the owner.
  • Cross-broker topology lag and ISR aggregation into the topology page is still pending.
  • More failure testing is needed before treating this as production-ready HA.
ItemStatusNotes
TransactionsOut of scopeNot planned
Production-ready clustered HAPlannedExperimental coordination, replication, and failover are wired, but hardening and runbooks remain
Single-node queue deletionImplementedAdmin API/dashboard; see Admin Surface
Multi-node queue deletionPlannedCoordinated teardown across replicas is pending
Message TTL (drop by age)ImplementedPer-message + per-queue default; see Publish
Queue expiration (auto-delete idle queue)PlannedDistinct from message TTL; needs global/coordinated idle tracking
Log retention by age (truncate old messages)Planned or undecidedNot currently exposed as a user feature
Message purge (empty a queue)PlannedRe-scoped: needs a replicated reset, not in-memory only
Python clientPlannedFuture client priority, including a blocking client
C# clientPlannedFuture client priority after Python
Go clientPlannedFuture client priority after C#
Java clientPlannedFuture client priority after Go