These FAQs relate to MongoDB listener for change data capture (CDC), also referred to as MongoDB CDC listener.
Before you begin:
-
MongoDB CDC listener requires a Professional or Enterprise edition. See Celigo platform editions.
-
You need an existing MongoDB connection, or you can create one during setup. See Set up a connection to MongoDB.
-
Every MongoDB export configured as a listener is its own consumer of the MongoDB change stream /
oplog. -
A single export can listen to all collections, but if you have high change volume, it may fall behind.
Best practice for scale:
-
Create multiple exports, each with:
-
Different collections (for example, one for
users, or one forproducts), or -
Different aggregation pipelines (each export filters to its own subset)
This approach spreads the load across multiple consumers so you can keep up with all the logs.
-
For snapshot.mode , the drop-down list only shows:
-
when_needed -
no_data
In Additional properties, advanced users can manually type any Debezium snapshot.mode value in the text field, such as initial, initial_only, always, or the like).
That is:
-
The drop-down list provides safe, common choices.
-
The text field also provides the full Debezium power, if you know what you're doing.
The basic difference in these snapshot.mode values in Additional properties:
-
no_data: Does not read existing data. Only captures new changes from the moment the listener starts. -
when_needed: On first run (no cursor yet): performs an initial snapshot of matching collections, then streams new changes from that point.On later runs (cursor already exists): no new snapshot; it just continues from the existing cursor.
Recommended: set snapshot.mode=when_needed in Additional properties on a new export for that collection.
-
First run: No cursor exists. Debezium runs a full snapshot of the collection(s). A cursor is established.
-
After that: The same export only reads new changes from that cursor onward.
So the simple pattern is:
-
Create a new listener/export for the collection(s) you're onboarding.
-
Set
snapshot.mode=when_neededin Additional properties. -
Let it run once to load history and establish the cursor.
-
Keep it running to continue as a MongoDB listener for change data capture (CDC).
Note
Use no_data only if you never want this export to do a snapshot (for example, you did a separate bulk/historical load already).
I started with when_needed for users collection, then later added products collection. Why didn't I get a historical load for products collection?
Because by the time you added products collection, the export already had a valid cursor.
-
when_neededonly snapshots when there is no cursor yet. -
Once a cursor exists, adding a new collection does not trigger another snapshot.
-
Result: you get only new
productschanges, no historicalproductsdata.This is the key "gotcha" to understand.
Until there's a UI option to "reset cursor," there are two options to get historical data for a new collection if your export already has a cursor:
Option A – Temporarily change snapshot.mode on the existing export.
-
Add the new collection (for example,
products) to Collections. -
In Additional properties, change
snapshot.modeto something likeinitial. -
Restart the export so it snapshots everything that matches (including the new collection).
-
After the snapshot completes, switch
snapshot.modeback towhen_neededorno_data.
-
Pros: Single export handles everything.
-
Cons: Snapshots all collections in that export, not just the new one.
Option B – Use a separate export for historical loads
-
Create a new export only for historical data:
-
In Collections: only the new collection(s) you want the history for.
-
In Additional attributes, set
snapshot.mode=when_needed(or another snapshot mode you prefer).
-
-
Run it once to load historical data.
-
Disable/remove that export when finished.
-
Add the new collection to your main CDC export.
-
It now only needs to capture new changes going forward.
-
-
Pros: Keeps historical loading separate from the main, stable listener.
There are two layers that control this option:
-
Debezium needs to provide the field.
Debezium
capture.modecontrols whetherbefore/afterexist at all-
For MongoDB, Debezium only includes
before/afterfor update and delete events ifcapture.modeis set to the right value:-
afteron update events -
only present when
capture.modeis one of:-
change_streams_update_full -
change_streams_update_full_with_pre_image(Stack Overflow)
-
-
beforeon update/delete events -
Only present when
capture.modeis one of the*_with_pre_image modes, for example:-
change_streams_with_pre_image -
change_streams_update_full_with_pre_image(Debezium)
-
If your
capture.modeis set to a mode that doesn't include them, Debezium never sendsbefore/afterin the event, so Celigo can't show them—no matter what you pick in Fields to include. -
-
-
The listener's Fields to include decides whether we keep it
Fields to include is just a filter on top of Debezium.
The Fields to include drop-down list in the Create listener form for MongoDB is basically from the Debezium event envelope, the top-level fields you want available in your flow
Examples:
-
after -
before -
op -
removedFields -
updatedFields -
ts_ms, ts_ns, ts_us -
schema, source, transaction, truncatedArrays, etc
Consider it as a pre-transform field filter:
-
If you don't select
after, even if Debezium is sending it, Celigo will drop it ,and you won't see it in the payload you map/transform. -
If you do select
afterbut your capture.mode doesn't provide it, it'll just be missing/null because Debezium never sent it.
-
Why does my MongoDB CDC listener break when I remove collections or make the aggregation pipeline more restrictive?
This usually happens because Celigo's MongoDB CDC (Debezium) is resuming from a stored cursor position (resume token/offset) that was created under the previous "shape" of the change stream.
-
Debezium reads MongoDB change streams and stores progress (offset) so it can resume on restart. (Debezium)
-
MongoDB's change streams are an aggregation pipeline. When you resume with a resume token, MongoDB warns that you should use the same pipeline and options that were used to generate that token—changing them can prevent resuming or create inconsistent / unpredictable behavior. (MongoDB)
-
Shrinking scope (more restrictive)
-
Add stronger
$matchfilters (filter out more events). -
Change the pipeline/options in a way that excludes what used to be included.
In these cases, the last stored resume token may refer to an event that the new stream definition does not include (or can't "land on" in the resumed stream). When Debezium restarts and tries to resume, MongoDB may fail to resume that cursor, so it's referred to as a "bad cursor."
Recommended options to deal with "shrinking" changes as a breaking change to the cursor:
-
Option A – Start fresh (cleanest)
-
Create a new listener/export with the updated collection list / pipeline.
-
Let it establish a new cursor/offset (and do a snapshot if needed).
-
-
Option B – Keep the listener stable; filter downstream
-
Keep the change stream pipeline as stable as possible.
-
Do projection/filtering later in your flow (mapping/transform), so you don't invalidate resuming.
-
-
Option C – Reset cursor
-
Plan for a cursor reset if the old resume token may not be reusable after tightening scope.
-
-
-
Expanding scope (less restrictive)
-
Loosen
$matchfilters (filter out less) -
Broaden the pipeline
This is more likely to work because the old resume token still points to an event that remains valid under the expanded stream definition, so MongoDB can resume successfully.
Simple mental model: a resume token is like a bookmark that's only reliable if you keep reading the "same edition" of the book (same pipeline/options). Changing the pipeline/options can invalidate the bookmark.
-
-
Why do I need an additional filter for snapshots (and why doesn't my main aggregation pipeline apply)?
Because snapshots and CDC streaming are two different phases in Debezium, and they use different mechanisms:
CDC streaming and Snapshots and are two different phases:
-
CDC streaming (change stream): Your main "aggregation pipeline" is applied to the MongoDB change stream(Debezium applies it when streaming changes). This filters change events as they occur.
-
Snapshot (historical load): Debezium does not read history via the change stream pipeline. It performs a separate snapshot read of existing documents to build the initial baseline, then transitions to streaming.
What this means in practice
If you only set the main aggregation pipeline:
-
Your snapshot may load more data than you intend (because it isn't filtered by the streaming pipeline).
-
Your streaming will then be filtered, creating a mismatch between "historical data" and "ongoing changes."
What to do
If you want the snapshot to match your intended dataset, you must configure a snapshot-specific filter in Additional properties (for example, using snapshot.collection.filter.overrides) so the snapshot phase only loads the documents you want.