> subsystem-summary-of-catchup
read this skill for a token-efficient summary of the catchup subsystem
curl "https://skillshub.wtf/stellar/stellar-core/subsystem-summary-of-catchup?format=md"Catchup Subsystem — Technical Summary
The catchup subsystem in stellar-core is responsible for synchronizing a node's local ledger state with the rest of the network when it falls behind. It downloads historical data (ledger headers, transactions, and bucket snapshots) from history archives, verifies integrity, and applies the data to bring the node up to date.
All files reside in src/catchup/.
Key Classes and Data Structures
CatchupConfiguration
File: CatchupConfiguration.h/.cpp
Immutable configuration describing a catchup request. Parameterized by:
toLedger— destination ledger number (orCURRENT = 0to resolve at runtime from the archive).count— number of ledgers to replay before the destination.0= minimal (buckets only),UINT32_MAX= complete history.Mode— one ofOFFLINE_BASIC,OFFLINE_COMPLETE, orONLINE.
Key methods:
resolve(uint32_t remoteCheckpoint)— substitutesCURRENTwith an actual checkpoint ledger number.offline()/online()— predicates for mode.
Helper free functions parseLedger() and parseLedgerCount() parse CLI strings.
CatchupRange
File: CatchupRange.h/.cpp
Computed from CatchupConfiguration + the current LCL + HistoryManager. Decides what the catchup must do:
| Field | Meaning |
|---|---|
mApplyBuckets | Whether a bucket-apply phase is needed. |
mApplyBucketsAtLedger | Checkpoint ledger at which to apply buckets (0 if !mApplyBuckets). |
mReplayRange (LedgerRange) | Half-open range of ledgers to replay after bucket-apply. |
Five logical cases based on LCL position, requested count, and checkpoint boundaries (see comments in header). Invariants enforced by checkInvariants().
Key accessors: applyBuckets(), replayLedgers(), getBucketApplyLedger(), getReplayRange(), getFullRangeIncludingBucketApply().
LedgerApplyManager / LedgerApplyManagerImpl
Files: LedgerApplyManager.h, LedgerApplyManagerImpl.h/.cpp
Abstract interface + concrete implementation. This is the top-level coordinator between the consensus layer (Herder) and the catchup/apply machinery. Owned by Application.
Key data members (Impl):
mCatchupWork—shared_ptr<CatchupWork>, the running catchup work item (null when not catching up).mSyncingLedgers—map<uint32_t, LedgerCloseData>, buffer of ledgers received from the network that cannot be applied yet. Has strict invariants: either empty, starts at LCL+1, or contains at most 65 ledgers within a checkpoint boundary.mLastQueuedToApply— tracks the highest ledger sequence queued for application.mLargestLedgerSeqHeard— the highest ledger seq ever received.mMetrics(CatchupMetrics) — counters for archive states downloaded, checkpoints, ledgers verified, buckets downloaded/applied, tx sets downloaded/applied.mCatchupFatalFailure— set when catchup fails unrecoverably (e.g., incompatible core version).MAX_EXTERNALIZE_LEDGER_APPLY_DRIFT = 12— maximum ledger drift allowed before entering catchup in parallel-close mode.
Key methods:
processLedger(LedgerCloseData, isLatestSlot)— main entry point called by Herder/LedgerManager when a new consensus ledger arrives. Logic:- If catchup is done, resets
mCatchupWork. - If ledger is old (≤ mLastQueuedToApply), skip.
- If ledger is the next sequential one and no catchup running →
tryApplySyncingLedgers(). - Otherwise buffers the ledger, trims the buffer, and decides whether to
startOnlineCatchup().
- If catchup is done, resets
startCatchup(CatchupConfiguration, archive)— schedules aCatchupWorkon theWorkScheduler.startOnlineCatchup()— constructs aCatchupConfigurationtargetingfirstBuffered - 1in ONLINE mode.trimSyncingLedgers()— garbage-collects old entries frommSyncingLedgers, keeping at most one checkpoint's worth plus one.tryApplySyncingLedgers()— iterates sequential ledgers inmSyncingLedgersand applies them viaLedgerManager::applyLedger(). In parallel-close mode, posts work to the ledger-close thread.maybeGetNextBufferedLedgerToApply()— returns the next buffered ledger (LCL+1) if available; used byApplyBufferedLedgersWork.
CatchupWork
File: CatchupWork.h/.cpp
The central Work subclass orchestrating all catchup steps. Extends Work (composite work pattern).
Key data members:
mLocalState(HistoryArchiveState) — local BucketList state at catchup start.mDownloadDir(unique_ptr<TmpDir>) — temporary directory for downloaded files.mLiveBuckets,mHotBuckets— maps from hash → downloaded Bucket objects.mCatchupConfiguration— the resolved configuration.mGetHistoryArchiveStateWork,mGetBucketStateWork— work to fetch HAS from archive.mDownloadVerifyLedgersSeq— work sequence for downloading + verifying ledger headers.mVerifyLedgers(VerifyLedgerChainWork) — verifies ledger chain integrity.mBucketVerifyApplySeq— work sequence for downloading, verifying, and applying buckets.mTransactionsVerifyApplySeq(DownloadApplyTxsWork) — work for downloading and applying transactions.mApplyBufferedLedgersWork— applies buffered network ledgers after catchup replay.mCatchupSeq— final composite work sequence.mVerifiedLedgerRangeStart(LedgerHeaderHistoryEntry) — the verified ledger at the start of the catchup range (used for bucket-apply).mFatalFailureFuture— shared_future indicating unrecoverable failure.
Key control flow (runCatchupStep() / doWork()):
- Get HAS —
getAndMaybeSetHistoryArchiveState()fetches the remote history archive state, validates network passphrase, checks that target > LCL. - Resolve CatchupRange — from config + HAS + LCL.
- Get bucket HAS —
getAndMaybeSetBucketHistoryArchiveState()if bucket-apply is needed and the bucket HAS differs from the main HAS. - Download & verify ledger chain —
downloadVerifyLedgerChain()spawnsBatchDownloadWork+VerifyLedgerChainWorkin aWorkSequence. - Build catchup sequence — after ledger verification succeeds:
- If
applyBuckets():downloadApplyBuckets()→DownloadBucketsWork+ApplyBucketsWork. - If
replayLedgers():downloadApplyTransactions()→DownloadApplyTxsWork. - A Herder consistency check work is prepended.
- If
- Bucket-apply completion — calls
LedgerManager::setLastClosedLedger()with the verified state, clears rebuild flags. - Apply buffered ledgers — after the main catchup sequence succeeds,
ApplyBufferedLedgersWorkdrainsmSyncingLedgers.
Constants: PUBLISH_QUEUE_UNBLOCK_APPLICATION = 8, PUBLISH_QUEUE_MAX_SIZE = 16 — flow-control the publish queue during catchup.
VerifyLedgerChainWork
File: VerifyLedgerChainWork.h/.cpp
BasicWork subclass that verifies a range of downloaded ledger header files. Processes checkpoints from highest to lowest, linking each checkpoint's hash chain to the next.
Key data members:
mDownloadDir,mRange,mCurrCheckpoint— the files to verify and current position.mLastClosed(LedgerNumHashPair) — local LCL for consistency checks.mTrustedMaxLedger(shared_future<LedgerNumHashPair>) — trusted hash from SCP consensus for the range end.mVerifiedAhead(LedgerNumHashPair) — hash-link propagation between checkpoint verifications.mVerifiedMinLedgerPrev(promise) — outgoing: the hash just before the verified range, so bucket-apply can validate.mMaxVerifiedLedgerOfMinCheckpoint— the max ledger of the lowest checkpoint; used by CatchupWork asmVerifiedLedgerRangeStart.mFatalFailurePromise— set when a mismatch against trusted hash is detected.mChainDisagreesWithLocalState— records local-state disagreements (e.g., bad LCL hash, incompatible version).
Key method — verifyHistoryOfSingleCheckpoint():
- Opens the checkpoint ledger header file.
- Iterates entries, verifying each ledger header hash and link to the previous.
- At the range end, verifies against
mTrustedMaxLedger. - At each checkpoint boundary, checks hash-chain linkage with
mVerifiedAhead. - On the lowest checkpoint, writes hash-link to
mVerifiedMinLedgerPrevand recordsmMaxVerifiedLedgerOfMinCheckpoint. - Checks local state (LCL hash, protocol version) and records disagreements.
onRun():
Calls verifyHistoryOfSingleCheckpoint() once per crank. On success, decrements mCurrCheckpoint and returns WORK_RUNNING until all checkpoints are verified. Maps various error statuses to WORK_FAILURE with appropriate log messages.
DownloadApplyTxsWork
File: DownloadApplyTxsWork.h/.cpp
BatchWork subclass that iterates over checkpoints in a replay range, yielding a work sequence per checkpoint: download → unzip → apply.
Key data members:
mRange(LedgerRange) — the half-open replay range.mDownloadDir— shared temp directory.mLastApplied(LedgerHeaderHistoryEntry&) — reference to the last applied header (updated on success).mCheckpointToQueue— next checkpoint to schedule.mLastYieldedWork— the previous checkpoint's work, used for sequencing.mWaitForPublish— if true, gates application on publish queue size.
yieldMoreWork():
For each checkpoint:
- Creates
GetAndUnzipRemoteFileWorkfor the transaction file. - Creates
ApplyCheckpointWorkfor the ledger range within that checkpoint. - Wraps application in a
ConditionalWorkthat:- Waits for the previous checkpoint's work to finish.
- Optionally waits for the publish queue to drain below
PUBLISH_QUEUE_MAX_SIZE. - Optionally waits for BucketList merges.
- Appends cleanup work to delete temporary files.
- Returns the whole sequence as a
WorkSequence.
ApplyCheckpointWork
File: ApplyCheckpointWork.h/.cpp
BasicWork subclass that applies transactions from a single checkpoint (at most one checkpoint worth of ledgers).
Key data members:
mDownloadDir— temp dir with ledger + tx files.mLedgerRange— the aligned ledger range to apply.mCheckpoint— the checkpoint number.mHdrIn,mTxIn— XDR input streams for ledger headers and transactions.mConditionalWork— wrapsApplyLedgerWorkin a conditional that waits for BucketList merge futures to resolve.
Key control flow (onRun()):
- If a conditional work is active, cranks it. On success, verifies the resulting LCL hash matches the expected header hash.
- Checks if done (all ledgers in range applied).
- Opens input files if needed.
- Calls
getNextLedgerCloseData()which reads the next header from file, performs knitting checks (skip old, verify LCL hash continuity, verify tx set hash), and constructs aLedgerCloseData. - Creates
ApplyLedgerWorkwrapped in aConditionalWorkthat waits for BucketList merge futures.
ApplyLedgerWork
File: ApplyLedgerWork.h/.cpp
Minimal BasicWork subclass. onRun() calls LedgerManager::applyLedger(lcd, false) to close a single ledger. No retry.
ApplyBucketsWork
File: ApplyBucketsWork.h/.cpp
Work subclass that applies bucket snapshot state to the database.
Key data members:
mBuckets— map of hash → LiveBucket (downloaded buckets).mApplyState(HistoryArchiveState) — the archive state to apply.mBucketsToApply— ordered vector of buckets (L0 curr, L0 snap, L1 curr, ...).mBucketApplicator— the activeBucketApplicatorinstance.mSeenKeys,mSeenKeysBeforeApply— deduplication sets to ensure only the newest version of each entry is written.mIndexBucketsWork— child work to index bucket files (runs first).mAssumeStateWork— child work to assume BucketList state (runs after all buckets applied).
Key control flow (doWork()):
- Index buckets — spawns
IndexBucketsWork<LiveBucket>on first call. - Apply buckets — iterates through
mBucketsToApplyin order, usingBucketApplicatorto incrementally write entries to the database. Entries already inmSeenKeysare skipped (ensures newest-version-wins). After each bucket, runs invariant checks. - Assume state — spawns
AssumeStateWorkwhich indexes both live and hot archive buckets, then callsBucketManager::assumeState()to set the BucketList to the target state and restart merges.
AssumeStateWork
File: AssumeStateWork.h/.cpp
Work subclass spawned at the end of ApplyBucketsWork. Holds strong references to all buckets in the HAS (including future buckets from pending merges) to prevent garbage collection during indexing.
doWork():
- Spawns
IndexBucketsWork<LiveBucket>andIndexBucketsWork<HotArchiveBucket>. - Spawns a callback work that calls
BucketManager::assumeState()andInvariantManager::checkAfterAssumeState(). - Returns
checkChildrenStatus().
IndexBucketsWork<BucketT>
File: IndexBucketsWork.h/.cpp
Template Work subclass that indexes bucket files in parallel. For each non-empty, non-indexed bucket, spawns an IndexWork child.
IndexWork (inner class):
- Posts indexing to a background thread via
postOnBackgroundThread. - Tries to load a persisted index file first; if corrupt or outdated, creates a fresh index via
createIndex<BucketT>(). - On completion, posts result back to main thread and calls
BucketManager::maybeSetIndex().
ApplyBufferedLedgersWork
File: ApplyBufferedLedgersWork.h/.cpp
BasicWork subclass used at the end of catchup to drain mSyncingLedgers. On each onRun():
- Checks if previous
ConditionalWorkis done. - Asks
LedgerApplyManager::maybeGetNextBufferedLedgerToApply()for the next ledger. - Wraps
ApplyLedgerWorkin aConditionalWorkthat waits for BucketList merge futures. - Returns
WORK_SUCCESSwhen no more buffered ledgers available.
ReplayDebugMetaWork
File: ReplayDebugMetaWork.h/.cpp
Work subclass for offline replay of debug meta files (used in diagnostic scenarios). Iterates sorted debug meta files, optionally gunzips them, and spawns ApplyLedgersFromMetaWork (inner helper class) to read LedgerCloseMeta entries and apply them via ApplyLedgerWork. Can also apply a final StoredDebugTransactionSet for the latest ledger.
Key Data Flows
Online Catchup Flow
Herder (consensus)
│
▼
LedgerApplyManagerImpl::processLedger()
│
├─ If sequential with LCL → tryApplySyncingLedgers() → LedgerManager::applyLedger()
│
└─ If behind → buffer in mSyncingLedgers
│
└─ When checkpoint boundary reached → startOnlineCatchup()
│
▼
CatchupWork (scheduled on WorkScheduler)
│
├─ 1. GetHistoryArchiveStateWork → fetch remote HAS
├─ 2. Compute CatchupRange
├─ 3. downloadVerifyLedgerChain()
│ ├─ BatchDownloadWork (ledger header files)
│ └─ VerifyLedgerChainWork (hash-chain verification, highest→lowest)
├─ 4a. downloadApplyBuckets() [if applyBuckets()]
│ ├─ DownloadBucketsWork
│ ├─ verify HAS
│ └─ ApplyBucketsWork
│ ├─ IndexBucketsWork
│ ├─ BucketApplicator (per bucket, level by level)
│ └─ AssumeStateWork
├─ 4b. downloadApplyTransactions() [if replayLedgers()]
│ └─ DownloadApplyTxsWork (per checkpoint)
│ ├─ GetAndUnzipRemoteFileWork
│ └─ ApplyCheckpointWork
│ └─ ApplyLedgerWork (per ledger)
└─ 5. ApplyBufferedLedgersWork → drain mSyncingLedgers
Offline Catchup Flow
Same as online but triggered by startCatchup() directly (not by buffered ledgers), mode is OFFLINE_BASIC or OFFLINE_COMPLETE, no ApplyBufferedLedgersWork, and in OFFLINE_COMPLETE mode, DownloadVerifyTxResultsWork is also run for full validation.
Threading Model
- All
LedgerApplyManagerImplmethods assertthreadIsMain()— the catchup coordinator runs entirely on the main thread. - The
Work/BasicWorkframework is cranked on the main thread's event loop. IndexBucketsWork::IndexWorkposts indexing tasks to a background thread pool viapostOnBackgroundThread(), and posts results back to the main thread viapostOnMainThread().- In parallel-close mode (
parallelLedgerClose()),tryApplySyncingLedgers()postsapplyLedgercalls to the ledger-close thread. ApplyCheckpointWorkandApplyBufferedLedgersWorkuseConditionalWorkto poll for BucketList merge future resolution before applying ledgers, preventing application while background merges are pending.VerifyLedgerChainWorkusesstd::promise/std::shared_futurefor inter-work communication: the trusted max-ledger hash is passed in viashared_future, and the verified min-ledger-prev hash is passed out viapromise.
Ownership Relationships
Application
└─ LedgerApplyManagerImpl (unique_ptr, via LedgerApplyManager::create)
└─ mCatchupWork: shared_ptr<CatchupWork> (owned while catchup active)
├─ mDownloadDir: unique_ptr<TmpDir>
├─ mLiveBuckets / mHotBuckets: map<string, shared_ptr<Bucket>>
├─ mGetHistoryArchiveStateWork: shared_ptr
├─ mDownloadVerifyLedgersSeq: shared_ptr<WorkSequence>
│ └─ mVerifyLedgers: shared_ptr<VerifyLedgerChainWork>
├─ mBucketVerifyApplySeq: shared_ptr<WorkSequence>
│ └─ ApplyBucketsWork
│ ├─ mIndexBucketsWork: shared_ptr<IndexBucketsWork>
│ │ └─ IndexWork children (per bucket, background thread)
│ ├─ mBucketApplicator: unique_ptr<BucketApplicator>
│ └─ mAssumeStateWork: shared_ptr<AssumeStateWork>
├─ mTransactionsVerifyApplySeq: shared_ptr<DownloadApplyTxsWork>
│ └─ per-checkpoint WorkSequence children
│ ├─ GetAndUnzipRemoteFileWork
│ └─ ApplyCheckpointWork
│ └─ mConditionalWork → ApplyLedgerWork
├─ mApplyBufferedLedgersWork: shared_ptr<ApplyBufferedLedgersWork>
└─ mCatchupSeq: shared_ptr<WorkSequence> (final composite)
LedgerApplyManagerImpl also owns mSyncingLedgers (the ledger buffer) independently of CatchupWork.
Key Invariants and Error Handling
CatchupRange::checkInvariants()ensures at least one of bucket-apply or replay is active, and validates sequencing between them.- Hash-chain verification in
VerifyLedgerChainWorkis done backwards (highest checkpoint first) to propagate trust from the SCP-consensus hash downward. - If
VerifyLedgerChainWorkdetects a mismatch with a trusted SCP hash, it setsmFatalFailurePromiseto true, causingCatchupWork::fatalFailure()to return true andLedgerApplyManagerImplto setmCatchupFatalFailure, permanently blocking further catchup attempts. ApplyCheckpointWorkvalidates that the resulting LCL hash matches the expected ledger header after each ledger application.- Publish queue flow control in
DownloadApplyTxsWorkprevents the publish queue from growing beyondPUBLISH_QUEUE_MAX_SIZEby gatingApplyCheckpointWorkbehind aConditionalWork. - BucketList merge futures are awaited (via
ConditionalWork) before applying any ledger, both during checkpoint replay and buffered ledger application.
> related_skills --same-repo
> validating-a-change
comprehensive validation of a change to ensure it is correct and ready for a pull request
> regenerating a technical summary of stellar-core
Instructions for regenerating the full set of subsystem and whole-system technical summary skill documents for stellar-core
> subsystem-summary-of-work
read this skill for a token-efficient summary of the work subsystem
> subsystem-summary-of-util
read this skill for a token-efficient summary of the util subsystem