FAQs – MongoDB listener for change data capture (CDC)

Tyler Lamparter

Updated June 30, 2026 07:56

These FAQs relate to MongoDB listener for change data capture (CDC), also referred to as MongoDB CDC listener.

Prerequisites

Before you begin:

MongoDB CDC listener requires a Professional or Enterprise edition. See Celigo platform editions.
You need an existing MongoDB connection, or you can create one during setup. See Set up a connection to MongoDB.

How do MongoDB exports relate to the oplog/change stream?

Every MongoDB export configured as a listener is its own consumer of the MongoDB change stream / oplog.
A single export can listen to all collections, but if you have high change volume, it may fall behind.

Best practice for scale:
Create multiple exports, each with:
- Different collections (for example, one for users, or one for products), or
- Different aggregation pipelines (each export filters to its own subset)
This approach spreads the load across multiple consumers so you can keep up with all the logs.

What `snapshot.mode` options do I see versus what can I actually use?

For snapshot.mode , the drop-down list only shows:

when_needed
no_data

In Additional properties, advanced users can manually type any Debezium snapshot.mode value in the text field, such as initial, initial_only, always, or the like).

That is:

The drop-down list provides safe, common choices.
The text field also provides the full Debezium power, if you know what you're doing.

What's the basic difference between `when_needed` and `no_data`?

The basic difference in these snapshot.mode values in Additional properties:

no_data: Does not read existing data. Only captures new changes from the moment the listener starts.
when_needed: On first run (no cursor yet): performs an initial snapshot of matching collections, then streams new changes from that point.On later runs (cursor already exists): no new snapshot; it just continues from the existing cursor.

How should I load historical data the first time for a collection?

Recommended: set snapshot.mode=when_needed in Additional properties on a new export for that collection.

First run: No cursor exists. Debezium runs a full snapshot of the collection(s). A cursor is established.
After that: The same export only reads new changes from that cursor onward.

So the simple pattern is:

Create a new listener/export for the collection(s) you're onboarding.
Set snapshot.mode=when_needed in Additional properties.
Let it run once to load history and establish the cursor.
Keep it running to continue as a MongoDB listener for change data capture (CDC).

Note

Use no_data only if you never want this export to do a snapshot (for example, you did a separate bulk/historical load already).

I started with `when_needed` for `users` collection, then later added `products` collection. Why didn't I get a historical load for `products` collection?

Because by the time you added products collection, the export already had a valid cursor.

when_needed only snapshots when there is no cursor yet.
Once a cursor exists, adding a new collection does not trigger another snapshot.
Result: you get only new products changes, no historical products data.

This is the key "gotcha" to understand.

How can I get historical data for a new collection if my export already has a cursor?

When an export already has a stored cursor, adding a new table does not automatically trigger a new snapshot. The export typically continues from its existing cursor and captures only new changes.

To load historical data for a newly added table, choose one of the following options.

Option A – Reset the cursor (recommended)

Use Cursor management to reset the export's stored cursor.

Note

The Cursor management section is displayed only when the export has a stored CDC cursor (offset). Celigo typically stores the offset after approximately five minutes of active CDC streaming. If no cursor exists, the Cursor management section is hidden.

Once the section becomes available:

Expand Cursor management.
Click Reset to stage a cursor reset.
Save the export to apply the reset.

When the export is saved, Celigo clears the stored cursor and restarts CDC processing. The export typically resumes processing within approximately one minute.After restart, behavior depends on the value that's configured in snapshot.mode. Depending on the selected mode, the connector may perform a snapshot before resuming CDC processing.Note: After a cursor reset is saved, the CDC service restarts. The export typically resumes processing within approximately one minute.

Pros
- Product-supported approach.
- No need to create additional exports.
- No need to manually manage cursor state.

Option B – Temporarily change snapshot.mode on the existing export.

Add the new collection (for example, products) to Collections.
In Additional properties, change snapshot.mode to something like initial.
Restart the export so it snapshots everything that matches (including the new collection).
After the snapshot completes, switch snapshot.mode back to when_needed or no_data.

Pros: Single export handles everything.
Cons: Snapshots all collections in that export, not just the new one.

Option C – Use a separate export for historical loads

Create a new export only for historical data:
- In Collections: only the new collection(s) you want the history for.
- In Additional attributes, set snapshot.mode=when_needed (or another snapshot mode you prefer).
Run it once to load historical data.
Disable/remove that export when finished.
Add the new collection to your main CDC export.
- It now only needs to capture new changes going forward.

Pros: Keeps historical loading separate from the main, stable listener.

Why don't I see the `after` or `before` option in my listener data?

There are two layers that control this option:

Debezium needs to provide the field.

Debezium capture.mode controls whether before / after exist at all
- For MongoDB, Debezium only includes before / after for update and delete events if capture.mode is set to the right value:
  - after on update events
  - only present when capture.mode is one of:
    
    change_streams_update_full
    
    change_streams_update_full_with_pre_image (Stack Overflow)
  - before on update/delete events
  - Only present when capture.mode is one of the *_with_pre_image modes, for example:
    
    change_streams_with_pre_image
    
    change_streams_update_full_with_pre_image (Debezium)
  If your capture.mode is set to a mode that doesn't include them, Debezium never sends before/after in the event, so Celigo can't show them—no matter what you pick in Fields to include.
The listener's Fields to include decides whether we keep it

Fields to include is just a filter on top of Debezium.

The Fields to include drop-down list in the Create listener form for MongoDB is basically from the Debezium event envelope, the top-level fields you want available in your flow

Examples:
- after
- before
- op
- removedFields
- updatedFields
- ts_ms, ts_ns, ts_us
- schema, source, transaction, truncatedArrays, etc
Consider it as a pre-transform field filter:
- If you don't select after, even if Debezium is sending it, Celigo will drop it ,and you won't see it in the payload you map/transform.
- If you do select after but your capture.mode doesn't provide it, it'll just be missing/null because Debezium never sent it.

Why does my MongoDB CDC listener break when I remove collections or make the aggregation pipeline more restrictive?

This usually happens because Celigo's MongoDB CDC (Debezium) is resuming from a stored cursor position (resume token/offset) that was created under the previous "shape" of the change stream.

Debezium reads MongoDB change streams and stores progress (offset) so it can resume on restart. (Debezium)
MongoDB's change streams are an aggregation pipeline. When you resume with a resume token, MongoDB warns that you should use the same pipeline and options that were used to generate that token—changing them can prevent resuming or create inconsistent / unpredictable behavior. (MongoDB)
1. Shrinking scope (more restrictive)
  - Add stronger $match filters (filter out more events).
  - Change the pipeline/options in a way that excludes what used to be included.
  In these cases, the last stored resume token may refer to an event that the new stream definition does not include (or can't "land on" in the resumed stream). When Debezium restarts and tries to resume, MongoDB may fail to resume that cursor, so it's referred to as a "bad cursor."
  
  Recommended options to deal with "shrinking" changes as a breaking change to the cursor:
  - Option A – Start fresh (cleanest)
    
    Create a new listener/export with the updated collection list / pipeline.
    
    Let it establish a new cursor/offset (and do a snapshot if needed).
  - Option B – Keep the listener stable; filter downstream
    
    Keep the change stream pipeline as stable as possible.
    
    Do projection/filtering later in your flow (mapping/transform), so you don't invalidate resuming.
  - Option C – Reset cursor
    
    Plan for a cursor reset if the old resume token may not be reusable after tightening scope.
2. Expanding scope (less restrictive)
  - Loosen $matchfilters (filter out less)
  - Broaden the pipeline
  This is more likely to work because the old resume token still points to an event that remains valid under the expanded stream definition, so MongoDB can resume successfully.
  
  Simple mental model: a resume token is like a bookmark that's only reliable if you keep reading the "same edition" of the book (same pipeline/options). Changing the pipeline/options can invalidate the bookmark.

Why do I need an additional filter for snapshots (and why doesn't my main aggregation pipeline apply)?

Because snapshots and CDC streaming are two different phases in Debezium, and they use different mechanisms:

CDC streaming and Snapshots and are two different phases:

CDC streaming (change stream): Your main "aggregation pipeline" is applied to the MongoDB change stream(Debezium applies it when streaming changes). This filters change events as they occur.
Snapshot (historical load): Debezium does not read history via the change stream pipeline. It performs a separate snapshot read of existing documents to build the initial baseline, then transitions to streaming.

What this means in practice

If you only set the main aggregation pipeline:

Your snapshot may load more data than you intend (because it isn't filtered by the streaming pipeline).
Your streaming will then be filtered, creating a mismatch between "historical data" and "ongoing changes."

What to do

If you want the snapshot to match your intended dataset, you must configure a snapshot-specific filter in Additional properties (for example, using snapshot.collection.filter.overrides) so the snapshot phase only loads the documents you want.

Prerequisites

How do MongoDB exports relate to the oplog/change stream?

What snapshot.mode options do I see versus what can I actually use?

What's the basic difference between when_needed and no_data?

How should I load historical data the first time for a collection?

Note

I started with when_needed for users collection, then later added products collection. Why didn't I get a historical load for products collection?

How can I get historical data for a new collection if my export already has a cursor?

Note

Why don't I see the after or before option in my listener data?

Why does my MongoDB CDC listener break when I remove collections or make the aggregation pipeline more restrictive?

Why do I need an additional filter for snapshots (and why doesn't my main aggregation pipeline apply)?

Can't find what you're looking for?

What `snapshot.mode` options do I see versus what can I actually use?

What's the basic difference between `when_needed` and `no_data`?

I started with `when_needed` for `users` collection, then later added `products` collection. Why didn't I get a historical load for `products` collection?

Why don't I see the `after` or `before` option in my listener data?