Skip to content

Configuration

Fibril has two configuration layers:

  • Startup config decides how the server process starts.
  • Runtime settings decide live broker behavior and are persisted after first boot.

That split matters. Startup config is for things the process needs before it can run, such as bind addresses and the data directory. Runtime settings are for behavior that can be changed while the broker is running, such as delivery timing and idle queue cleanup.

The config file format is TOML. The repository includes fibril.example.toml:

[server]
data_dir = "server_data"
[broker.listener]
bind = "0.0.0.0:9876"
[admin.listener]
bind = "0.0.0.0:8081"
[admin.auth]
enabled = false
username = "fibril"
# password = "change-me"
[storage.keratin]
fsync_interval_ms = 5
# Floor between storage commits while the fsync worker is idle. 0 self-clocks
# group commit on fsync completions, the best setting for fast storage (NVMe,
# tmpfs). On slow-fsync storage such as SATA SSDs a floor around the fsync
# interval (5) gives the drive breathing room between write barriers.
min_fsync_interval_ms = 0
[storage.keratin.message_log]
segment_max_bytes = 268435456
[storage.keratin.event_log]
segment_max_bytes = 33554432
[runtime_seed.delivery]
inflight_ttl_ms = 30000
expiry_poll_min_ms = 15000
expiry_batch_max = 8192
delivery_poll_max_ms = 5000
[runtime_seed.idle_queue_cleanup]
enabled = false
evict_after_ms = 600000
sweep_interval_ms = 60000
# Set this for sparse workloads with long-lived publishing connections.
# publisher_idle_timeout_ms = 600000
[runtime_seed.connection]
# Reconnect grace is on by default (5000 ms). Set to 0 to disable it.
# reconnect_grace_ms = 5000
[runtime_seed.replication]
confirm_timeout_ms = 5000
caught_up_poll_ms = 1000
retry_poll_ms = 100
checkpoint_retry_poll_ms = 5000
max_messages_per_read = 256
max_events_per_read = 256
max_bytes_per_read = 8388608
max_iterations_per_tick = 8
min_in_sync_replicas = 1
isr_timeout_ms = 10000
read_timeout_slack_ms = 10000
owner_connect_timeout_ms = 5000
[runtime_seed.partitioning]
default_partition_count = 1
[runtime_seed.consumer_groups]
# Blank or omit to disable the under-provisioned signal.
# default_target_per_consumer = 4
[coordination.ganglion]
heartbeat_interval_ms = 3000
liveness_ttl_ms = 9000
[runtime_locks]
idle_queue_cleanup = false

Run with a config file:

Terminal window
cargo run --release --bin fibril-server -- --config fibril.toml

or:

Terminal window
FIBRIL_CONFIG=fibril.toml cargo run --release --bin fibril-server

Startup config is resolved in this order:

compiled defaults < TOML config file < environment variables < CLI arguments

This precedence applies to startup fields and first-boot runtime seeds. It does not mean environment variables keep overriding persisted runtime settings after runtime state exists.

These fields are read on process start.

TOML fieldEnv varCLI flagDefault
server.data_dirFIBRIL_DATA_DIR--data-dirserver_data
broker.listener.bindFIBRIL_BROKER_BIND--broker-bind0.0.0.0:9876
broker.listener.advertiseFIBRIL_BROKER_ADVERTISEnonederived (see below)
admin.listener.bindFIBRIL_ADMIN_BIND--admin-bind0.0.0.0:8081
admin.auth.enabledFIBRIL_ADMIN_AUTH_ENABLED--admin-auth-enabledfalse
admin.auth.usernameFIBRIL_ADMIN_USERNAME--admin-usernamefibril
admin.auth.passwordFIBRIL_ADMIN_PASSWORD--admin-passwordunset
storage.keratin.fsync_interval_msFIBRIL_KERATIN_FSYNC_INTERVAL_MS--keratin-fsync-interval-ms5
storage.keratin.min_fsync_interval_msFIBRIL_KERATIN_MIN_FSYNC_INTERVAL_MS--keratin-min-fsync-interval-ms0
storage.keratin.message_log.segment_max_bytesFIBRIL_KERATIN_MESSAGE_LOG_SEGMENT_MAX_BYTES--keratin-message-log-segment-max-bytes268435456
storage.keratin.event_log.segment_max_bytesFIBRIL_KERATIN_EVENT_LOG_SEGMENT_MAX_BYTES--keratin-event-log-segment-max-bytes33554432
coordination.modeFIBRIL_COORDINATION_MODEnonestatic
coordination.ganglion.heartbeat_interval_msFIBRIL_COORDINATION_HEARTBEAT_INTERVAL_MSnone3000
coordination.ganglion.liveness_ttl_msFIBRIL_COORDINATION_LIVENESS_TTL_MSnone9000
coordination.ganglion.target_followersnonenone1
coordination.ganglion.stream_replication_factornonenone1
coordination.ganglion.repartition_adoption_timeout_msnonenone30000
coordination.ganglion.assignment_durabilityFIBRIL_COORDINATION_ASSIGNMENT_DURABILITYnonelocal_durable
recovery.on_mismatchFIBRIL_RECOVERY_ON_MISMATCHnonequarantine

Changing these generally requires restarting the server.

broker.listener.advertise is the address (or addresses) the broker tells peers and clients to reach it on, separate from bind. This matters when bind is not itself dialable - the common case is binding 0.0.0.0 in a container, which a peer cannot connect back to. Give it a routable host:port (a service name is fine, it is resolved at connect time), or several comma-separated entries in FIBRIL_BROKER_ADVERTISE in priority order. When unset it is derived in ganglion mode from this node’s coordination peer host plus the broker port, and otherwise falls back to bind. Standalone single-broker deployments do not need it (clients connect to the broker directly). Only the first entry is dialed today; the rest are carried for forward compatibility.

coordination.mode is static for a standalone single-broker deployment (the default) or ganglion to run the embedded coordinator and form a cluster. The coordination.ganglion.* settings only apply in ganglion mode. See clustering and replication.

coordination.ganglion.target_followers is the desired follower count per queue partition. coordination.ganglion.stream_replication_factor is the equivalent for DURABLE Plexus stream partitions — tuned separately so stream and queue fault tolerance can differ; only the durable tier replicates, the express tiers stay owner-only. A value of one keeps a durable stream available across a single node loss; zero makes durable streams owner-only (durable on disk, not HA). See Plexus streams. coordination.ganglion.assignment_durability is the default durability policy for new assignments (local_durable, replica_accepted, replica_durable, or majority_durable).

recovery.on_mismatch controls what happens when recovery finds a damaged queue log: quarantine (default) isolates the partition, refuse reports not ready, and ignore truncates to the last valid record. See recovery quarantine.

admin.auth.enabled = true requires both admin.auth.username and admin.auth.password. The admin password is intentionally not shown in the dashboard startup summary.

storage.keratin.message_log.segment_max_bytes and storage.keratin.event_log.segment_max_bytes are rollover thresholds. A segment rolls after an append crosses the configured size, so an individual segment can be slightly larger than this value.

coordination.ganglion.heartbeat_interval_ms controls how often a broker renews its cluster liveness record. coordination.ganglion.liveness_ttl_ms controls how long a broker can go without a fresh heartbeat before the cluster considers it unavailable. The TTL must be at least twice the heartbeat interval. For heavy replication benchmarks, a longer TTL can avoid false failover while the node is under artificial load.

coordination.ganglion.repartition_adoption_timeout_ms bounds how long a live repartition’s finalize (retiring shrunk-away partitions and clearing the transition marker) waits for clients to adopt the new routing once the backlog has drained. Adoption is observed from client topology acks. The timeout keeps a silent or stuck client from stalling a cutover forever; publish version-fencing is the correctness backstop regardless. See live routing and cutover.

runtime_seed values initialize the persisted runtime settings document when no runtime settings exist yet.

After runtime settings exist, the persisted values own these settings. You can edit them through the admin settings page or the admin runtime settings API.

TOML fieldDefaultMeaning
runtime_seed.delivery.inflight_ttl_ms30000How long a delivered message lease lasts before it can be retried.
runtime_seed.delivery.expiry_poll_min_ms15000Minimum sleep between expiry checks when no earlier expiry is known.
runtime_seed.delivery.expiry_batch_max8192Maximum expired messages to requeue in one expiry pass. Must be at least 1.
runtime_seed.delivery.delivery_poll_max_ms5000Maximum idle poll delay for delivery loops.
TOML fieldEnv/CLI compatibilityDefaultMeaning
runtime_seed.idle_queue_cleanup.enabledenabled implicitly by FIBRIL_QUEUE_IDLE_EVICT_AFTER_MS or --queue-idle-evict-after-msfalseEnables unloading idle queues from memory.
runtime_seed.idle_queue_cleanup.evict_after_msFIBRIL_QUEUE_IDLE_EVICT_AFTER_MS, --queue-idle-evict-after-ms600000How long a queue must be idle before cleanup can unload it.
runtime_seed.idle_queue_cleanup.sweep_interval_msFIBRIL_QUEUE_IDLE_SWEEP_INTERVAL_MS, --queue-idle-sweep-interval-ms60000How often the cleanup worker checks tracked queues. Must be at least 1.
runtime_seed.idle_queue_cleanup.publisher_idle_timeout_msFIBRIL_PUBLISHER_CACHE_IDLE_TIMEOUT_MS, --publisher-idle-timeout-msunsetLets unused publishers stop keeping queues active while a connection remains open.

For sparse workloads, enable publisher idle expiry alongside queue cleanup. Without it, a long-lived connection that published to a queue can keep that queue active until the connection closes.

See many idle queues for the user-facing behavior.

TOML fieldEnv/CLI compatibilityDefaultMeaning
runtime_seed.connection.reconnect_grace_msFIBRIL_RECONNECT_GRACE_MS, --reconnect-grace-ms5000Keeps a disconnected resumable client alive for this long before cleaning up subscriptions and requeueing unsettled messages. On by default so a transient blip resumes transparently; set 0 to disable.

Reconnect grace is disabled when unset. It only helps clients that use the resume identity handshake.

These settings apply to the experimental cluster replication path.

TOML fieldDefaultMeaning
runtime_seed.replication.confirm_timeout_ms5000How long a replica-durable publish confirm can wait for enough durable follower progress.
runtime_seed.replication.caught_up_poll_ms1000Follower pull interval while already caught up with the owner. Lower values can reduce idle replica-durable confirm latency at the cost of more wakeups.
runtime_seed.replication.retry_poll_ms100Follower retry interval after a partial pull or transient replication error.
runtime_seed.replication.checkpoint_retry_poll_ms5000Follower retry interval while it needs an owner checkpoint before it can continue.
runtime_seed.replication.max_messages_per_read256Maximum message records a follower asks the owner for in one pull.
runtime_seed.replication.max_events_per_read256Maximum event records a follower asks the owner for in one pull.
runtime_seed.replication.max_bytes_per_read8388608Approximate byte budget for one owner replication response. One oversized message can exceed it so replication still makes progress.
runtime_seed.replication.max_iterations_per_tick8Maximum pull/apply iterations a follower performs before yielding.
runtime_seed.replication.min_in_sync_replicas1Minimum recently in-sync replicas required before accepting replica-durable publishes. 1 disables the floor.
runtime_seed.replication.isr_timeout_ms10000How recently a follower must report durable progress to count as in sync.
runtime_seed.replication.read_timeout_slack_ms10000Slack added to a follower read’s long-poll window before the read is abandoned and the connection dropped, so a dropped owner response cannot hang the follower. A read waits max_wait_ms + this.
runtime_seed.replication.owner_connect_timeout_ms5000Upper bound on a follower establishing a connection to an owner (TCP connect plus the handshake) before it is abandoned and retried.
runtime_seed.replication.stream_enabledtrueUse credit-based streaming replication on followers. When disabled, followers fall back to polling pulls.
runtime_seed.replication.stream_apply_linger_us2000Microseconds a streaming follower gathers contiguous frames before one fsynced apply. Higher trades apply latency for fsync amortization. 0 is drain-only.
runtime_seed.replication.stream_apply_max_merge_bytes16777216Byte cap on a single coalesced streaming apply (peak memory versus fsync amortization).
runtime_seed.replication.stream_buffer_batches8In-flight batch buffer depth (credit window) for the streaming follower. Applied on the next stream.

The read-budget settings are useful when tuning replica-durable throughput. Small values reduce per-tick work, while larger values let a follower catch up faster when the owner is receiving sustained traffic.

TOML fieldDefaultMeaning
runtime_seed.partitioning.default_partition_count1Partition count for a new queue declared without an explicit count.
runtime_seed.consumer_groups.default_target_per_consumerunsetOptional soft target for exclusive consumer groups. When set, cohorts above the target can be reported as under-provisioned without reducing coverage.

Runtime locks let startup config own a runtime setting group and prevent admin edits.

[runtime_locks]
idle_queue_cleanup = true

When idle_queue_cleanup is locked:

  • startup config controls the effective idle queue cleanup settings
  • the admin settings page shows the group as locked
  • admin update attempts for that group are rejected
  • updates to other runtime settings can still be saved

Use locks only when config management should intentionally own the setting. Do not use ordinary env vars as hidden runtime overrides.

The admin UI exposes a settings page at:

/admin/settings

The JSON API is:

GET /admin/api/runtime-settings
PUT /admin/api/runtime-settings

Update requests include an expected_version. If another operator changed settings first, the API returns 409 Conflict with the current settings instead of overwriting them.

Some live settings are owned by storage-level state rather than the broker runtime settings document. The global dead-letter queue target is the current example.

The admin settings page also exposes:

GET /admin/api/global-dlq
PUT /admin/api/global-dlq

This setting:

  • applies live
  • is persisted in Fibril’s storage state
  • survives restart
  • uses expected_version and returns 409 Conflict if another update wins
  • is not seeded or overridden by TOML, environment variables, or CLI flags

See dead lettering for the setting shape and current limitations.

Current validation rules:

  • server.data_dir must not be empty
  • when admin.auth.enabled = true, admin.auth.username and admin.auth.password must both be set
  • storage.keratin.fsync_interval_ms must be at least 1
  • storage.keratin.message_log.segment_max_bytes must be at least 1
  • storage.keratin.event_log.segment_max_bytes must be at least 1
  • coordination.ganglion.heartbeat_interval_ms must be at least 1
  • coordination.ganglion.liveness_ttl_ms must be at least twice coordination.ganglion.heartbeat_interval_ms
  • runtime_seed.delivery.expiry_batch_max must be at least 1
  • runtime_seed.idle_queue_cleanup.sweep_interval_ms must be at least 1
  • runtime_seed.replication poll intervals and worker limits must be at least 1
  • runtime_seed.replication.min_in_sync_replicas must be at least 1
  • runtime_seed.replication.isr_timeout_ms must be at least 1
  • runtime_seed.partitioning.default_partition_count must be at least 1

More validation will be added as more settings become user-facing.