Skip to content

improve: clarify EventProcessor logging and contract for events received before start#3382

Merged
csviri merged 2 commits into
operator-framework:mainfrom
Dennis-Mircea:improve/clarify-deferred-event-log
May 27, 2026
Merged

improve: clarify EventProcessor logging and contract for events received before start#3382
csviri merged 2 commits into
operator-framework:mainfrom
Dennis-Mircea:improve/clarify-deferred-event-log

Conversation

@Dennis-Mircea
Copy link
Copy Markdown
Contributor

@Dennis-Mircea Dennis-Mircea commented May 27, 2026

Summary

When the SDK starts up, there is a short window (observed at ~140 ms on Flink Kubernetes Operator startup) between when EventSourceManager starts the informers and when EventProcessor.start() is actually called. During that window, the initial informer LIST emits ADDED events for every existing resource, and EventProcessor.handleEvent(...) receives them while running == false.

The events are not lost. handleEvent calls resourceStateManager.getOrCreateOnResourceEvent(...), records the metric, and then calls handleEventMarking(event, state) before checking running. When EventProcessor.start() is eventually invoked, it sets running = true and calls handleAlreadyMarkedEvents(), which replays every state with eventPresent(). Delete events are also short-circuited to cleanupForDeletedEvent(...) even when not running.

The problem is that the log message in this branch is alarming and reads exactly like dropped work:

Skipping event: ResourceEvent{...} because the event processor is not started

…and there is no JavaDoc on handleEvent explaining the marking-then-replay contract, so the only way to confirm "events are not lost" is to read the code.

This PR clarifies both:

  1. Log message reworded. From "Skipping event: {} because the event processor is not started" to "Deferring event: {} until the event processor starts". Same log level, same information, no behavior change. Just stops users (and downstream operator authors debugging startup) from believing the SDK is silently discarding events.
  2. JavaDoc added on handleEvent. One short paragraph that documents the deferral-and-replay contract and points at #start() and #handleAlreadyMarkedEvents() so future readers don't have to reverse-engineer the flow.

No behavior change. Strictly an observability + documentation improvement.

Why this came up

While debugging an unrelated issue on Flink Kubernetes Operator, I noticed the following sequence in DEBUG logs at startup:

EventSourceManager   [DEBUG] Starting event source ControllerResourceEventSource ...
InformerWrapper      [DEBUG] Starting informer for namespace: JOSDK_ALL_NAMESPACES ...
EventProcessor       [DEBUG] Received event: ResourceEvent{action=ADDED, ...}
EventProcessor       [DEBUG] Skipping event: ResourceEvent{...} because the event processor is not started
... (repeats for many resources) ...
EventProcessor       [DEBUG] Starting event processor: ...

The "Skipping event ... not started" message is misleading as it looks like silently dropped initial state. The code is correct (the events are marked and replayed via handleAlreadyMarkedEvents), but the log line and the missing JavaDoc made it hard to tell from operator logs alone.

…ved before start

Signed-off-by: Dennis-Mircea Ciupitu <dennis.mircea.ciupitu@gmail.com>
Copilot AI review requested due to automatic review settings May 27, 2026 11:20
@openshift-ci openshift-ci Bot requested review from csviri and xstefank May 27, 2026 11:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Improves clarity around how the EventProcessor behaves when receiving events before it has been started, by documenting the deferral mechanism and updating the associated debug log message.

Changes:

  • Added Javadoc to handleEvent describing deferral/replay behavior during the startup window
  • Updated debug logging from “Skipping” to “Deferring” to better reflect actual behavior

Copy link
Copy Markdown
Collaborator

@csviri csviri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thank you!

@csviri
Copy link
Copy Markdown
Collaborator

csviri commented May 27, 2026

@Dennis-Mircea please format the code (it is enough if you run mvn clean install locally). thx!

Signed-off-by: Dennis-Mircea Ciupitu <dennis.mircea.ciupitu@gmail.com>
@Dennis-Mircea
Copy link
Copy Markdown
Contributor Author

@Dennis-Mircea please format the code (it is enough if you run mvn clean install locally). thx!

Done!

@csviri csviri merged commit 257497d into operator-framework:main May 27, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants