Skip to content

feat(ci): live health dashboard — GitHub API + Gradio Space#3324

Merged
pkooij merged 2 commits intofeat/benchmark-cifrom
feat/health-dashboard
Apr 8, 2026
Merged

feat(ci): live health dashboard — GitHub API + Gradio Space#3324
pkooij merged 2 commits intofeat/benchmark-cifrom
feat/health-dashboard

Conversation

@pkooij
Copy link
Copy Markdown
Member

@pkooij pkooij commented Apr 8, 2026

Summary

Adds CI infrastructure so the private LeRobot health dashboard at
lerobot/health-dashboard
has data to display. Stacked on top of #3319.

The Space itself lives in a standalone HF Spaces repo — this PR only adds the
two files needed on the lerobot side:

File Purpose
`scripts/ci/parse_eval_metrics.py` Reads `eval_info.json` written by `lerobot-eval`, extracts `pc_success` + `n_episodes`, writes `metrics.json`
`.github/workflows/benchmark_tests.yml` Adds "Parse metrics" + "Upload metrics" artifact steps after each eval (`if: always()`)

How the dashboard works (no extra datastore):

  • Space queries the GitHub Actions API directly with a read-only fine-grained token
    (`Actions=read`, `Metadata=read` on `huggingface/lerobot` only)
  • Each workflow is fetched individually via `/actions/workflows/{id}/runs?branch=main&per_page=1`
    so scheduled/nightly jobs (Nightly Deps, Docker) are always shown regardless of run frequency
  • Benchmark jobs upload a `metrics.json` artifact — the Space downloads and parses it for
    success rate and episode count
  • Rollout videos are fetched from the artifact zip and cached on disk by artifact ID

Dashboard panels:

  • Overall health banner (green / yellow / red)
  • CI status table grouped by: Tests · Benchmarks · Build & Publish · Quality
  • Success rate and duration trend charts (last 30 benchmark runs)
  • Latest rollout video per benchmark (LIBERO, MetaWorld)

Test plan

  • Verify `parse_eval_metrics.py` writes correct `metrics.json` after a libero/metaworld eval
  • Verify `libero-metrics` / `metaworld-metrics` artifacts appear in the Actions UI
  • Open lerobot/health-dashboard — confirm status table, charts, and videos load (requires `GITHUB_RO_TOKEN` Space secret to be set)

🤖 Generated with Claude Code

pkooij and others added 2 commits April 8, 2026 17:46
- spaces/health-dashboard/app.py: Gradio Space that queries the GitHub
  Actions API directly (no extra datastore). Shows benchmark status
  badges, success-rate and duration trend charts, and embeds the latest
  rollout video per benchmark. Results cached 5 min in-memory; video
  files cached on disk by artifact ID so downloads only happen once.
- spaces/health-dashboard/requirements.txt + README.md: Space card with
  setup instructions for the GITHUB_RO_TOKEN secret (actions:read,
  metadata:read only).
- scripts/ci/parse_eval_metrics.py: runs on the CI host after each eval,
  reads eval_info.json written by lerobot-eval, extracts pc_success and
  n_episodes, and writes metrics.json to the artifacts dir.
- .github/workflows/benchmark_tests.yml: add "Parse … metrics" and
  "Upload … metrics" steps (if: always()) after each eval so the
  dashboard has data even when the eval fails.

The Space should be deployed as a private Space under the huggingface
org. Required secret: GITHUB_RO_TOKEN (fine-grained, read-only).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n HF Hub

The Gradio Space is now a standalone repo deployed to
https://huggingface.co/spaces/lerobot/health-dashboard (private).
Only the CI scripts and workflow changes belong in this repo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pkooij pkooij merged commit e89e6d9 into feat/benchmark-ci Apr 8, 2026
@pkooij pkooij deleted the feat/health-dashboard branch April 8, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant