Add autotune — autonomous MOT tracker optimization loop [rebase&merge]#346
Draft
Add autotune — autonomous MOT tracker optimization loop [rebase&merge]#346
autotune — autonomous MOT tracker optimization loop [rebase&merge]#346Conversation
- experiments/program.md: autoresearch contract — research question, HOTA≥60 target, hard boundaries, 7 research starting points (Kalman P/R init, two-threshold association, velocity attenuation, etc.) - experiments/optimize_tracking.py: Optuna-based metric runner; n_trials=1 evaluates defaults; multi-core via multiprocessing+SQLite; agent updates search space as architecture evolves - experiments/README.md: motivation, approach, target analysis (HOTA ceiling derivation), pre-flight checks, references - pyproject.toml: add `optimize` dependency group (optuna[rdb], fire) --- Co-authored-by: Claude Code <noreply@anthropic.com>
autotrack/optimize_tracking.py - --det-tag TAG CLI arg: overrides the directory suffix for any custom detector without touching _DET_SOURCE_TO_TAG; _validate_args and _resolve_sequences both accept it - Multiprocessing progress bar: replaced pool.starmap with starmap_async + a polling loop that loads the SQLite study every 2 s and feeds a Rich Progress bar showing completed trials and live best HOTA (mirrors the existing single-worker callback approach) - Module docstring updated with --det-tag usage example autotrack/README.md - Fixed cd experiments → cd autotrack; old --tracker sort --fast → positional syntax - YOLO section replaced with YOLOX section (correct weights filename) - RF-DETR section added as a standalone step - New Custom detections section: dir layout, MOT format, --det-tag usage - Pre-flight checks table updated (removed API key row, fixed commands) - Fixed /optimize campaign experiments/ → autotrack/ - Fixed broken Files table row for optimize_tracking.py autotrack/program.md - generate_detections.py added to scope_files - Weights filename corrected (yolox_x.pth → bytetrack_x_mot17.pth.tar) - RF-DETR and custom detector quickstart notes added below pre-flight table --- Co-authored-by: Claude Code <noreply@anthropic.com>
- generate_detections.py: remove YOLOX backend (loader, predictor, frame processing); add YOLO-World via inference-models with center→top-left coord conversion; rename rfdetr-l → rfdetr/l to match yolo_world/l slash notation - optimize_tracking.py: swap yolox→yoloworld in _DET_SOURCE_TO_TAG; extract _run_parallel_study; fix multiline ternaries to if/else; use setattr() for dynamic Kalman attrs (mypy); pass >3 args as kwargs - best_config.json: drop broken yolox entry (HOTA=7.7); add real Optuna results for yoloworld, rfdetr, dpm across all three trackers - pyproject.toml: remove YOLOX git source + no-build-isolation; add inference-models>=0.19.0 --- Co-authored-by: Claude Code <noreply@anthropic.com>
- search_space.json: expand 16 boundary-hugging parameters across all three trackers (lost_track_buffer, track_activation_threshold, minimum_iou_threshold, high_conf_det_threshold, q_scale/r_scale/p_scale, velocity_decay, q_miss_alpha, max_interpolation_gap, p_reset_threshold, direction_consistency_weight); add log=true to lost_track_buffer (all trackers) and minimum_iou_threshold (all trackers) - optimize_tracking.py: pass log= to suggest_int so log-scale int parameters are respected - best_config.json: bytetrack/rfdetr updated to HOTA 45.08 from new run - uv.lock: regenerated after yolox removal --- Co-authored-by: Claude Code <noreply@anthropic.com>
…mation (ORU) - Add oru_enabled parameter to ByteTrackKalmanBoxTracker: on re-detection after occlusion, replay virtual predict+update cycles along linearly interpolated trajectory to re-estimate velocity - Expose oru_enabled in optimize_tracking.py _build_tracker and _define_search_space - Add oru_enabled to default_config.json and search_space.json --- Co-authored-by: Claude Code <noreply@anthropic.com>
…0.05) - Add stage2_iou_threshold=0.05 param to ByteTrackTracker; stage-1 keeps minimum_iou_threshold=0.1 - Lower stage-2 threshold recovers more low-confidence detections without breaking high-conf stage - Expose to Optuna via search_space.json; add to default_config.json and optimize_tracking.py --- Co-authored-by: OpenAI Codex <codex@openai.com>
…larity - Add iou_age_weight=0.03: scale stage-1 IoU similarity by 1/(1+w*lost_frames) for each track - Biases Hungarian assignment toward recently-seen tracks; reduces stale-prediction false matches - iou_age_weight=0.03 is active at default params; Optuna range [0.0, 0.2] log-scale --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Apply age discount only to cost matrix (not threshold check): raw IoU used for min-threshold gate, discount only biases solver assignment toward active tracks - Tighten Optuna search range [0.0, 0.2] -> [0.0, 0.1] - Fix pre-existing bug: optimize_tracking.py final re-eval now applies _apply_kalman_patch --- Co-authored-by: Claude Code <noreply@anthropic.com>
Apply Optuna-found parameter values as new defaults: lost_track_buffer 30→62, track_activation_threshold 0.7→0.314, q_scale 0.01→0.00246, r_scale 0.1→0.292, p_scale 1.0→7.34, velocity_decay 0.95→0.817, q_miss_alpha 0.1→0.461, max_interpolation_gap 20→30, p_reset_threshold 5→13; HOTA 56.781→57.424 (+1.13%) --- Co-authored-by: Claude Code <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces the new autotrack/ workflow for autonomous + Optuna-based optimization of MOT17 trackers, and updates core tracker internals to support additional post-processing and association/Kalman behaviors that the optimization loop can tune and validate.
Changes:
- Added
autotrack/tooling: Optuna runner (optimize_tracking.py), detection generation (generate_detections.py), visualization utilities, and configuration/artifact files (default_config.json,search_space.json,best_config.json,program.md). - Extended ByteTrack and SORT utilities with new association / Kalman mechanics and MOT-gap interpolation.
- Added an
optimizedependency group and adjusted repo formatting/ignore configs to support the new workflow.
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| trackers/core/sort/utils.py | Adds MOT-format short-gap interpolation helper used by autotrack evaluation output. |
| trackers/core/bytetrack/tracker.py | Adds stage-2 IoU threshold and IoU age discount for stage-1 ranking; updates association gating logic. |
| trackers/core/bytetrack/kalman.py | Adds velocity decay, miss-noise inflation, P-reset, and ORU mechanics to ByteTrack Kalman tracker. |
| README.md | Badge formatting change (single-line). |
| pyproject.toml | Adds optimize dependency group and uv git source for onnx-simplifier. |
| docs/trackers/ocsort.md | Reflowed paragraph formatting. |
| docs/trackers/comparison.md | Reflowed admonition formatting. |
| CODE_OF_CONDUCT.md | Reflowed paragraph formatting. |
| autotrack/visualize_detections.py | New utility to render MOT detections on frames. |
| autotrack/search_space.json | New Optuna parameter search space definitions per tracker. |
| autotrack/README.md | New documentation for the autotrack workflow and benchmarks. |
| autotrack/program.md | New campaign contract/spec for the autonomous optimization loop. |
| autotrack/optimize_tracking.py | New Optuna study runner + evaluation harness using trackers.eval. |
| autotrack/generate_detections.py | New script to generate MOT17 detections via RF-DETR / YOLO-World backends. |
| autotrack/default_config.json | New baseline/default parameter set for --n-trials 1 runs. |
| autotrack/best_config.json | New committed “best known” tuned configs used for warm-starting/guarding. |
| .pre-commit-config.yaml | mdformat configured with --wrap=no (drives markdown reflow behavior). |
| .gitignore | Adjusts ignores (including .python-version) and adds autotrack output/cache patterns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
07f7488 to
a62024a
Compare
…ecovery Short occlusions (1-4 frames) are handled well by velocity decay alone; ORU trajectory replay is beneficial only for longer gaps where velocity has drifted. HOTA 57.424→57.813 (+0.686%), IDF1 69.573→70.009 --- Co-authored-by: Claude Code <noreply@anthropic.com>
- bytetrack/sdp Optuna result: 58.753 (was 56.115 before i10-i11) - New optimal params include oru_threshold=14, q_scale/r_scale/p_scale all ~10x lower --- Co-authored-by: Claude Code <noreply@anthropic.com>
- q_scale 0.00246→0.000202, r_scale 0.292→0.0441, p_scale 7.34→0.731 (tighter Kalman — trust measurements more) - oru_threshold 5→14, velocity_decay 0.817→0.774, q_miss_alpha 0.461→0.282 - stage2_iou_threshold 0.05→0.233, lost_track_buffer 62→52, p_reset_threshold 13→26 - HOTA 57.813→58.753 (+1.30%) --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Confidence boost in Hungarian cost: solver_iou *= (1 + w * conf[det]) - Neutral at all tested defaults (0.0–0.5); added to Optuna search space [0.0, 1.0] - IDSW improved 297→293 at w=0.3 but HOTA regressed; w=0.1 exactly neutral --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Mature-track-only stage-2: only tracks with >= N updates participate in low-conf recovery - Neutral at N=0,1; regresses at N>=2 — ghost exclusion hurts legitimate young tracks - Added to Optuna search space [0, 5] for future joint optimisation --- Co-authored-by: Claude Code <noreply@anthropic.com>
699e62f to
1bc1138
Compare
…disabled) - Add _giou_matrix() helper and giou_blend param to ByteTrackTracker stage-1 cost - giou_blend=0.0 default keeps metric at 58.753 (best found 0.32 gave +0.092%, below 0.1% threshold) - Add giou_blend to search_space.json [0.0, 1.0] and optimize_tracking.py wiring - Fix best_config.json trailing newline --- Co-authored-by: Claude Code <noreply@anthropic.com>
…earch) - 1000-trial Optuna search over expanded search space (new: conf_cost_weight, stage2_min_updates, giou_blend) - HOTA 58.753→58.862 (+0.185%), IDSW 297→269 (-9.4%) - Key changes: high_conf_det_threshold 0.608→0.795, oru_threshold 14→0, Kalman looser (q_scale/r_scale ~14x), minimum_consecutive_frames 2→1, stage2_min_updates 5, giou_blend 0.396, conf_cost_weight 0.170 --- Co-authored-by: Claude Code <noreply@anthropic.com>
- HOTA 58.862→58.961 (+0.168%), IDSW 269→266, IDF1 71.365→71.730 - Optuna search was capped at stage2_min_updates≤5; manual scan found peak at 12 (cliff at 14+) - Widen search_space.json high: 5→15 so future guard runs can explore the full range --- Co-authored-by: Claude Code <noreply@anthropic.com>
--- Co-authored-by: Claude Code <noreply@anthropic.com>
- HOTA 58.961→59.031 (+0.119%), IDSW 266→262, IDF1 71.730→71.852 - max_interpolation_gap 45→48 (Optuna undershoot, true peak at 48) - giou_blend 0.3963→0.42 (refined from 0.396 Optuna result) - velocity_decay 0.827→0.82 (slight tightening of decay) --- Co-authored-by: Claude Code <noreply@anthropic.com>
15fce38 to
cf03dd4
Compare
- Runs tracker over all 21 MOT17 test sequences (7 IDs × 3 public detectors) - Produces a flat ZIP ready for CodaBench upload (21 files, MOT 10-col format) - Default: --det-source all (DPM/FRCNN/SDP run separately); single source replicates to all 3 DET slots - Uses tracker constructor defaults; dataset path resolved relative to script - Gitignore: track autotrack/*.zip to keep submission ZIPs out of repo --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Rename directory via git mv - Update all internal references (cd paths, study name, README tables, CLI hints) --- Co-authored-by: Claude Code <noreply@anthropic.com>
34d440c to
3653d30
Compare
… & guard script to `guard.py`
- Replace hardcoded bytetrack/sdp in metric_cmd, notes, and optuna commands with {algo} and {det_source} template tokens resolved at run time by the /optimize skill
- Config defaults: algo=bytetrack, det_source=sdp (no behaviour change for existing runs)
- Remove fixed target (varies per tracker); campaign now runs to max_iterations
- Consolidate Phase 1 findings as ByteTrack-specific history; flag which improvements are not yet in SORT/OC-SORT
- Drop hardcoded class names and per-tracker baselines from hard boundaries; add tuned HOTA reference table
- Replaced inline guard logic in `program.md` with a dedicated `guard.py` script.
- Ensures modularity and improves maintainability for HOTA regression checks.
- Provides isolated execution environment for `/optimize` runs
- Includes dependencies for OpenCV, git, and build tools
- Configures Python environment with prebuilt venv for faster runtime setup
---
Co-authored-by: Claude Code <noreply@anthropic.com>
- Add velocity_decay to SORTKalmanBoxTracker: shrinks velocity components each missed frame (default 0.82), prevents runaway linear extrapolation - Add q_miss_alpha: inflates Q proportionally to time_since_update for lost tracks (default 0.8), widens uncertainty so re-detection gets higher gain - Add p_reset_threshold: resets P to identity after gaps >= threshold frames (default 10), discards stale accumulated uncertainty on re-detection - Wire all three params through SORTTracker.__init__ and _spawn_new_trackers - Add to search_space.json and default_config.json for sort - HOTA: 53.217 → 53.738 (+0.521, +0.98%) at default params on sdp --- Co-authored-by: Claude Code <noreply@anthropic.com>
…ion for SORT - On re-detection after >= oru_threshold missed frames, override Kalman velocity with virtual trajectory: (current_bbox - last_observed_bbox) / gap_frames - Store _last_observed_bbox at each update for ORU computation - Add oru_threshold param (default 3) — technique from OC-SORT paper - Register in search_space.json [0, 15] and default_config.json --- Co-authored-by: Claude Code <noreply@anthropic.com>
…lation for SORT - Add conf_cost_weight param to SORTTracker: boosts Hungarian solver matrix by detection confidence (tiebreaker only — IoU gate uses raw IoU), default 0.2 - Add conf_cost_weight to sort search_space.json [0.0, 1.0] and default_config.json - Set max_interpolation_gap default from 0 → 30 in default_config.json (activates existing interpolate_mot_gaps post-processing already wired in optimize_tracking.py) - HOTA: 53.217 → 54.506 (+1.289, +2.4%) at default params on sdp --- Co-authored-by: Claude Code <noreply@anthropic.com>
…nment to SORT - Scale DIoU cost matrix by detection confidence (1 + conf_cost_weight * conf) to break ties in favor of higher-confidence detections - Threshold gate uses raw DIoU so the boost only affects ranking, not filtering - Add conf_cost_weight param (default 0.0 = disabled) to SORTTracker constructor - Update search_space.json, default_config.json, optimize_tracking.py Co-authored-by: Claude Code <noreply@anthropic.com>
…or faster track confirmation
- track_activation_threshold: 0.25 → 0.9725 (strict initiation, fewer FP tracks) - minimum_consecutive_frames: 2 → 1 (immediate confirmation, safe with high threshold) - max_interpolation_gap: 30 → 57 (more gap bridging, fewer ID switches) - minimum_iou_threshold: 0.3 → 0.275 (closer to optimal) - lost_track_buffer: 30 → 26 (tuned optimal) HOTA: 54.959 → 56.136 (+1.177, +2.14%) --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Discount DIoU similarity for lost tracks by 1/(1 + iou_age_weight * lost_frames) - Biases solver to prefer active tracks over stale Kalman predictions, reducing ID switches - Threshold gate uses raw DIoU so valid matches are never rejected by the discount - Add iou_age_weight param (default 0.0 = disabled) to SORTTracker constructor - Update search_space.json, default_config.json, optimize_tracking.py Co-authored-by: Claude Code <noreply@anthropic.com>
…ion to SORT - Split detections by confidence threshold into high (stage 1) and low (stage 2) groups - Stage 1 matches high-confidence dets to all tracks using DIoU + age discount + conf boost - Stage 2 matches low-confidence dets to unmatched tracks using a lower IoU threshold - New tracks only spawned from unmatched high-confidence detections - Refactor _get_associated_indices into static _match method for reuse across stages - Extract _build_solver_iou helper for age discount + confidence boost application - Add high_conf_det_threshold (default 0.0 = disabled) and stage2_iou_threshold params - Update search_space.json, default_config.json, optimize_tracking.py Co-authored-by: Claude Code <noreply@anthropic.com>
- DIoU/two-stage/Kalman code changes make old 56.129 threshold stale - Measured new default-param HOTA=55.656 with consolidated branch code --- Co-authored-by: Claude Code <noreply@anthropic.com>
DIoU returns ≤ 0 (not 0.0) for non-overlapping boxes due to the centre-distance penalty term; old assertion assumed pure IoU semantics. --- Co-authored-by: Claude Code <noreply@anthropic.com>
New code (DIoU + Kalman dynamics + conf-weight + age-discount + two-stage) tuned by Optuna achieves 57.675 vs 53.217 baseline (+8.4%). --- Co-authored-by: Claude Code <noreply@anthropic.com>
- README Algorithms table: add MOT17 HOTA (tuned) column; SORT 55.7→57.7 - autotune/program.md: SORT Phase 1 findings section with 9 kept changes, 5 reverted, tuned best config (HOTA=57.7, +8.4% from 53.217 baseline) --- Co-authored-by: Claude Code <noreply@anthropic.com>
- Set ocsort.max_interpolation_gap from 0 to 20 in default_config.json - Infrastructure already wired: optimize_tracking.py calls interpolate_mot_gaps() when max_gap > 0 - ByteTrack Phase 1 precedent: +1.666% HOTA at same max_gap value --- Co-authored-by: Claude Code <noreply@anthropic.com>
…for ocsort - Extend _apply_kalman_patch with ocsort elif branch: monkey-patches XCYCSRStateEstimator._create_filter to multiply paper-default Q/R/P by q_scale/r_scale/p_scale scalars after original init; defaults 1.0 preserve baseline HOTA - Add q_scale (0.001–10), r_scale (0.01–100), p_scale (0.01–100) to ocsort section of search_space.json (all log-scale) - Add q_scale=1.0, r_scale=1.0, p_scale=1.0 defaults to ocsort section of default_config.json --- Co-authored-by: Claude Code <noreply@anthropic.com>
… association - _get_iou_matrix in ocsort/utils.py now uses _compute_diou_matrix from sort/utils.py (center-distance penalty improves near-miss association) - OCR stage sv.box_iou_batch replaced with _compute_diou_matrix (consistent DIoU across both association stages) --- Co-authored-by: Claude Code <noreply@anthropic.com>
…ation - Promote Optuna-found best ocsort config (DIoU-calibrated): min_iou_thr 0.095→0.061, q_scale 1.214→0.0072, r_scale 14.31→0.136, p_scale 47.16→9.74 - Set conf_cost_weight=0.258 (Optuna best); together with DIoU yields HOTA=58.652 - Add conf_cost_weight param (default 0.0) to OCSORTTracker; boosts high-confidence detections in solver cost matrix while keeping raw IoU gate unchanged - Apply confidence boost in both primary (OCM) and OCR association stages - Wire conf_cost_weight into _build_tracker ocsort block and search_space.json --- Co-authored-by: Claude Code <noreply@anthropic.com>
…tale lost tracks - Add iou_age_weight param (default 0.0) to OCSORTTracker; discounts stale tracks' solver cost by 1/(1+iou_age_weight*(tsu-1)) pushing them to OCR stage; gate check unaffected - Wire into _build_tracker ocsort block; add to search_space.json ocsort (range 0-0.5); default_config.json default 0.0 --- Co-authored-by: Claude Code <noreply@anthropic.com>
…on in ocsort - Add p_reset_threshold to OCSORTTracklet.update(): after gap >= threshold frames, reset kf.P to identity on re-detection, discarding stale accumulated uncertainty - Thread p_reset_threshold through OCSORTTracker.__init__ and _spawn_new_tracklets - Wire into _build_tracker ocsort; add to search_space.json (range 0-30); default_config.json p_reset_threshold=0 --- Co-authored-by: Claude Code <noreply@anthropic.com>
…e dynamics - Add velocity_decay and q_miss_alpha params to OCSORTTracklet.predict(): attenuate velocity components and inflate Q each missed frame to reduce drift on lost tracks - Thread both params from OCSORTTracker.__init__ through _spawn_new_tracklets to tracklet - Expose both in autotune search_space.json (velocity_decay 0.5–1.0; q_miss_alpha 0.0–2.0) - Set neutral defaults in default_config.json (velocity_decay=1.0, q_miss_alpha=0.0) --- Co-authored-by: Claude Code <noreply@anthropic.com>
…earch - Promote velocity_decay=0.926, q_miss_alpha=0.512, p_reset=8, iou_age_weight=0.428 - Also: conf_cost_weight=0.970, q_scale=0.720, r_scale=1.189, p_scale=0.095 - New SDP baseline 58.905 (+0.43% over previous best 58.652) --- Co-authored-by: Claude Code <noreply@anthropic.com> Co-authored-by: OpenAI Codex <codex@openai.com>
- SDP table: fill OC-SORT autotune+Optuna row (HOTA=58.905, IDF1=71.636, MOTA=66.396, IDSW=291) and SORT row (HOTA=58.026) - Journal: add SORT Phase 1 section (9 kept, 5 reverted, +8.4%) - Journal: add OC-SORT Phase 1 section (7 iters + Codex, +10.4%) with positive experiments table, code features, and key lesson - README Algorithms table: OC-SORT MOT17 HOTA (tuned) 57.9→58.9 - autotune/program.md: OC-SORT Phase 1 findings section — 7 kept changes, tuned best config (HOTA=58.905, +10.4% from 53.351 baseline), Optuna insights --- Co-authored-by: Claude Code <noreply@anthropic.com>
Keep "already in the code" tables (prevent duplicate work) but strip dataset-specific verdicts that anchor future agents against valid hypotheses: - ByteTrack Phase 1 "tried and reverted" block removed - SORT Phase 1 "tried and reverted" sub-section removed - OC-SORT Phase 1 "Optuna findings" paragraph removed Full campaign history remains in autotune/README.md Journal section. --- Co-authored-by: Claude Code <noreply@anthropic.com>
3cccd8e to
5d61be1
Compare
autotrack — autonomous MOT tracker optimization loop [rebase&merge]autotune — autonomous MOT tracker optimization loop [rebase&merge]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MOT tracker quality depends on two largely independent axes: algorithm design and hyperparameter tuning. Most published improvements conflate them — a well-tuned weaker algorithm routinely beats a poorly-tuned stronger one, making it hard to isolate what actually matters. This PR separates the axes by adding
autotrack/, an autonomous optimization loop for SORT, ByteTrack, and OC-SORT on MOT17.The goal is both practical (better trackers, reproducible tuning) and scientific (the experiment log — including every reverted change — is itself a research artifact).
Approach
Three progressive layers build on each other:
Layer 1 — SOTA trackers with solid defaults. The existing
trackers/core/implementations of SORT, ByteTrack, and OC-SORT are already competitive out of the box. This layer is the foundation;autotrack/does not replace it.Layer 2 — Optuna extracts the best from the existing parameter surface.
optimize_tracking.pyruns an Optuna study over the tracker's exposed hyperparameters (Kalman noise scales, confidence thresholds, buffer sizes). No code changes — pure tuning. FRCNN results gain 1–2.5 HOTA points; SDP gains 2–4 points. This layer alone is useful as a standalone tuning tool and can be adopted without running the agent loop.Layer 3 — autotrack goes beyond tuning by making algorithmic improvements. This is the novel contribution. An autonomous agent iterates over structural code changes (state representation, association strategy, camera motion compensation, Kalman mechanics), measures HOTA at fixed default parameters after each change, keeps improvements, and reverts regressions. Optuna acts as a second-pass validator after each kept change to confirm the improvement is real and not a tuning artifact. The iteration log is JSONL and captures every attempt, kept or reverted.
Two tools govern the loop:
optimize_tracking.py --n-trials 1optimize_tracking.py --n-trials Nbest_config.json, validates tuned ceilingThe agent is explicitly permitted to update
optimize_tracking.pyas the tracker architecture evolves — adding parameters that newly exist, removing ones absorbed into the implementation, tightening search ranges as knowledge accumulates.Benchmarks
MOT17-val, full 7-sequence eval.
Defaults= fixed params fromdefault_config.json, no tuning.+Optuna= n=500 trials.+autotrack + Optuna= in progress.FRCNN public detections (bundled, no GPU)
SDP public detections (bundled, no GPU)
Estimated ceiling with code improvements + Optuna on FRCNN: ~61.9 HOTA (vs ~56.0 for tuning alone), derived from the DetA/AssA decomposition — DetA is bounded by the detector (~0.57–0.62 for FRCNN), but AssA has substantial headroom from ~0.55 to ~0.65 via better association logic.
Hard guarantees
Three invariants are enforced by
program.mdand cannot be relaxed by the agent:det/det.txt.gt/gt.txtis never accessed at inference time.trackers.evalonly.trackers/eval/is out of scope for agent edits. The metric computation is identical across all iterations; the agent cannot move the goalposts.Quick start
To run the autonomous agent loop, point any coding agent at
program.md:claude > Read program.md and start the experiment loop.References