Skip to content

Fix SHC appframework bundle-push loop in FIPS mode#1672

Closed
dpericaxon wants to merge 2 commits intosplunk:developfrom
dpericaxon:fix/fips-shc-bundlepush-loop-develop
Closed

Fix SHC appframework bundle-push loop in FIPS mode#1672
dpericaxon wants to merge 2 commits intosplunk:developfrom
dpericaxon:fix/fips-shc-bundlepush-loop-develop

Conversation

@dpericaxon
Copy link
Copy Markdown

Description

This PR fixes a retry storm / “app update loop” seen when Splunk Operator (v3.0.0) manages an SHC with App Framework enabled and Splunk is running in FIPS mode. In this scenario, the SHC bundle push status file can contain benign FIPS/TLS banner output (and sometimes ConfDeploymentException: ... already running) which was previously treated as a hard failure. The operator would reset the bundle push state back to Pending and immediately re-trigger the bundle push, causing collisions and preventing SHC app updates from completing.

This change makes the status parsing more robust by treating known-benign output as “still in progress” rather than a failure.

Key Changes

  • pkg/splunk/enterprise/afwscheduler.go

    • Use normalized status output (strip benign FIPS banner + TLS hostname warning) when determining completion.
    • Treat banner-only output as in-progress (do not reset bundle state to Pending).
    • Treat ConfDeploymentException: Can't start deployment job as one is already running! as in-progress (do not reset bundle state to Pending / do not re-trigger).
    • Preserve existing behavior for true errors (non-empty output that is not success and not the known benign cases).
  • pkg/splunk/enterprise/afwscheduler_test.go

    • Add unit test coverage for:
      • FIPS banner-only status output ⇒ remains BundlePushInProgress
      • FIPS banner + “already running” ⇒ remains BundlePushInProgress

Testing and Verification

  • Unit tests:
    • go test ./pkg/splunk/enterprise -run '^TestSHCRunPlaybook$' -v
  • Behavioral verification (manual):
    • Verified the operator no longer resets bundle push state to Pending when the status output contains only the FIPS/TLS banner or the “already running” ConfDeploymentException.
    • Verified SHC bundle push progresses without a re-trigger storm and SHC app updates can complete.

Related Issues

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

- Treat FIPS banner-only status output as in-progress

- Treat ConfDeploymentException 'already running' as in-progress

- Add unit tests
continue
}

// FIPS banner emitted by Splunk CLI in FIPS-enabled environments
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a list of benign banners somewhere that we can add to the checks here? I'm worried if we merge this it will only cover a specific scenario with hard coded values.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't find out, then as of now, we can just make a list of such banners (as of now 1-element), use this list in the code to validate the output and once we learn more, we can expand the list with new banners.

Copy link
Copy Markdown
Author

@dpericaxon dpericaxon Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlieberman-splunk @kasiakoziol is this an action item on your side to dig into this? For now I Implemented a benignLinePrefixes list in normalizeSHCBundlePushStatusOutput() (currently includes the known FIPS banner and the TLS hostname validation warning). This keeps the logic extensible and we can expand the list as additional benign lines are identified.Implemented a benignLinePrefixes list in normalizeSHCBundlePushStatusOutput() (currently includes the known FIPS banner and the TLS hostname validation warning). This keeps the logic extensible and we can expand the list as additional benign lines are identified.

normalizedOut := normalizeSHCBundlePushStatusOutput(stdOut)

// Check if we did not get the desired output in the status file. There can be 2 scenarios -
// 1. stdOut is empty, which means bundle push is still in progress
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment for consistency - stdOut to normalizedOut

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Updated the comment to reference normalizedOut for consistency with the normalized parsing logic.

@kasiakoziol
Copy link
Copy Markdown
Collaborator

Did you perform manual tests with the fix on both FIPS and non-FIPS enabled clusters?

@kasiakoziol kasiakoziol mentioned this pull request Jan 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 23, 2026

CLA Assistant Lite bot CLA Assistant Lite bot All contributors have signed the COC ✍️ ✅

@dpericaxon
Copy link
Copy Markdown
Author

I have read the Code of Conduct and I hereby accept the Terms

@dpericaxon
Copy link
Copy Markdown
Author

dpericaxon commented Jan 23, 2026

recheck

@dpericaxon
Copy link
Copy Markdown
Author

dpericaxon commented Jan 23, 2026

I have read the CLA Document and I hereby sign the CLA

@kubabuczak
Copy link
Copy Markdown
Collaborator

kubabuczak commented Jan 23, 2026

I have read the CLA Document and I hereby accept the Terms

@dpericaxon you need to post comment with "I have read the CLA Document and I hereby sign the CLA" - you hae slightly different wording

@dpericaxon
Copy link
Copy Markdown
Author

dpericaxon commented Jan 23, 2026

Did you perform manual tests with the fix on both FIPS and non-FIPS enabled clusters?

I did it for FIPs clusters. Do you perhaps have a non-fips cluster to test in quickly? If not I can see what I can come up with.

@dpericaxon
Copy link
Copy Markdown
Author

Hey @kubabuczak and @gabrielm-splunk I noticed that a new release got cut but the change above didn't make it in. Isn't this effecting other customers running on fips pods?

@kubabuczak
Copy link
Copy Markdown
Collaborator

Calling @rlieberman-splunk , as she is working on FIPS certification for this release

@rlieberman-splunk
Copy link
Copy Markdown
Collaborator

Hi @dpericaxon , I have opened a PR with similar changes and tested them on our FIPS enabled environment. These changes will be included in the next release.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants