Skip to content

Xutongr/sync#699

Open
xutongNV wants to merge 28 commits intofeature/PROJ-147-operator-redesignfrom
xutongr/sync
Open

Xutongr/sync#699
xutongNV wants to merge 28 commits intofeature/PROJ-147-operator-redesignfrom
xutongr/sync

Conversation

@xutongNV
Copy link
Copy Markdown
Contributor

Description

Issue #147

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

fernandol-nvidia and others added 26 commits March 10, 2026 20:07
* Initial commit

* Remove timing logs and linta

* Remove timing logs
…668)

* Add default filter (STATUS) to occupancy page

* Use compact bytes unit in Occupancy page
* Properly fix unique Prometheus ports per service (redo of #649)

PR #649 fixed port conflicts only in the bazel run scripts, meaning the
problem persisted when services were launched directly via their bazel
targets. This commit fixes it at the source by overriding
metrics_prometheus_port in each service's config class, so the correct
port is used regardless of how the service is started.

Port assignments (core stays at the base default of 9464):
  - worker:            9465
  - delayed_job_monitor: 9466
  - backend_listener:  9467
  - backend_worker:    9468

Kubernetes manifests are updated to match these new defaults.

Also fixes a pre-existing port name collision: the oauth2-proxy sidecar
declared its metrics port as "metrics", conflicting with the OSMO
service container port of the same name in the same pod. Renamed to
"oauth2-metrics" across all three chart sidecar helpers (service,
router, web-ui).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert run script port overrides made redundant by previous commit

The --metrics_prometheus_port flags added to the bazel run scripts by
#649 are now superseded by the per-service config class defaults. Remove
them to keep the scripts clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add oauth2-metrics endpoint to service PodMonitor

The oauth2-proxy sidecar port was renamed from "metrics" to
"oauth2-metrics" to avoid a pod-level port name collision. Add a
corresponding PodMonitor endpoint so Prometheus continues scraping
oauth2-proxy metrics when the sidecar is enabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix pylint missing-class-docstring in WorkerConfig

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Catch error when timing out

* update
* Revert "new fix for prometheus ports (#670)"

This reverts commit 122af26.

* Revert "Assign unique Prometheus metrics ports per service in bazel mode (#649)"

This reverts commit ef45396.

* Fix OAuth2 metrics

* Skip Prometheus metrics server in dev mode for all services

Prevents port conflicts when running multiple services locally with
--method=dev. Consistent with the pattern already applied to worker.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Skip metrics server startup when otel metrics are disabled

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "Skip Prometheus metrics server in dev mode for all services"

This reverts commit 9211ccc.

* Disable metrics by default

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The osmo_workspace+ directory containing cross-workspace dependencies
may exist inside the runfiles directory rather than at the image root.
Unify the pythonpath append to use local_runfiles_dir so it resolves
correctly in both cases.
Fix minor bugs in aws deployment script

- Upgrade database for RDS to postgres 15.12 (15.4 is not supported anymore)
- Fix CLI parameters not being correctly propogated to terraform
- Add support for NGC_API_KEY to pull non-public helm charts and images
- Fix bug where dry-run would not stop after terraform and would throw errors
Add basic redaction of secrets in workflow specs
* Fix user/pool filters in occupancy page

* Add occupancy to pools quicklink

* Add cross link from occupancy page to workflows, refactor for better code reuse

* Coderabbit

* Send `all_users` and `all_pools` for occupancy page when necessary

* Include initializing into default filters for occupancy

* Coderabbit
* fix backend listener

* lint
* Add verbose mode for config show

* Fix for past revisions
* Add datetime filter and use it in workflows page

* Clean up and simplify

* Update styling

* Coderabbit
* Add --offset and -f option to workflow list command
Add unit tests for the "group template" feature
@xutongNV xutongNV requested a review from a team as a code owner March 12, 2026 17:28
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 12, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 503f6ac8-5605-4b3f-b664-afbdc484406e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xutongr/sync
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 12, 2026

📖 Docs preview: https://d3in15bfzp49i0.cloudfront.net/699/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants