feat(ci): live health dashboard — GitHub API + Gradio Space by pkooij · Pull Request #3324 · huggingface/lerobot

pkooij · 2026-04-08T15:47:36Z

Summary

Adds CI infrastructure so the private LeRobot health dashboard at
lerobot/health-dashboard
has data to display. Stacked on top of #3319.

The Space itself lives in a standalone HF Spaces repo — this PR only adds the
two files needed on the lerobot side:

File	Purpose
`scripts/ci/parse_eval_metrics.py`	Reads `eval_info.json` written by `lerobot-eval`, extracts `pc_success` + `n_episodes`, writes `metrics.json`
`.github/workflows/benchmark_tests.yml`	Adds "Parse metrics" + "Upload metrics" artifact steps after each eval (`if: always()`)

How the dashboard works (no extra datastore):

Space queries the GitHub Actions API directly with a read-only fine-grained token
(`Actions=read`, `Metadata=read` on `huggingface/lerobot` only)
Each workflow is fetched individually via `/actions/workflows/{id}/runs?branch=main&per_page=1`
so scheduled/nightly jobs (Nightly Deps, Docker) are always shown regardless of run frequency
Benchmark jobs upload a `metrics.json` artifact — the Space downloads and parses it for
success rate and episode count
Rollout videos are fetched from the artifact zip and cached on disk by artifact ID

Dashboard panels:

Overall health banner (green / yellow / red)
CI status table grouped by: Tests · Benchmarks · Build & Publish · Quality
Success rate and duration trend charts (last 30 benchmark runs)
Latest rollout video per benchmark (LIBERO, MetaWorld)

Test plan

Verify `parse_eval_metrics.py` writes correct `metrics.json` after a libero/metaworld eval
Verify `libero-metrics` / `metaworld-metrics` artifacts appear in the Actions UI
Open lerobot/health-dashboard — confirm status table, charts, and videos load (requires `GITHUB_RO_TOKEN` Space secret to be set)

🤖 Generated with Claude Code

- spaces/health-dashboard/app.py: Gradio Space that queries the GitHub Actions API directly (no extra datastore). Shows benchmark status badges, success-rate and duration trend charts, and embeds the latest rollout video per benchmark. Results cached 5 min in-memory; video files cached on disk by artifact ID so downloads only happen once. - spaces/health-dashboard/requirements.txt + README.md: Space card with setup instructions for the GITHUB_RO_TOKEN secret (actions:read, metadata:read only). - scripts/ci/parse_eval_metrics.py: runs on the CI host after each eval, reads eval_info.json written by lerobot-eval, extracts pc_success and n_episodes, and writes metrics.json to the artifacts dir. - .github/workflows/benchmark_tests.yml: add "Parse … metrics" and "Upload … metrics" steps (if: always()) after each eval so the dashboard has data even when the eval fails. The Space should be deployed as a private Space under the huggingface org. Required secret: GITHUB_RO_TOKEN (fine-grained, read-only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n HF Hub The Gradio Space is now a standalone repo deployed to https://huggingface.co/spaces/lerobot/health-dashboard (private). Only the CI scripts and workflow changes belong in this repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pkooij and others added 2 commits April 8, 2026 17:46

pkooij merged commit e89e6d9 into feat/benchmark-ci Apr 8, 2026

pkooij deleted the feat/health-dashboard branch April 8, 2026 16:20

pkooij mentioned this pull request Apr 8, 2026

feat(ci): benchmark smoke tests with isolated Docker images (LIBERO + MetaWorld) #3319

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): live health dashboard — GitHub API + Gradio Space#3324

feat(ci): live health dashboard — GitHub API + Gradio Space#3324
pkooij merged 2 commits intofeat/benchmark-cifrom
feat/health-dashboard

pkooij commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pkooij commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pkooij commented Apr 8, 2026 •

edited

Loading