Skip to content

[Website][Experiment] Share Playground#3170

Draft
adamziel wants to merge 23 commits intotrunkfrom
add-share-button-php-relay
Draft

[Website][Experiment] Share Playground#3170
adamziel wants to merge 23 commits intotrunkfrom
add-share-button-php-relay

Conversation

@adamziel
Copy link
Copy Markdown
Collaborator

@adamziel adamziel commented Jan 22, 2026

Motivation for the change, related issues

You boot a Playground in your browser, click Share, and a friend visiting the link sees and interacts with the same WordPress instance — without you spinning up a server, exposing a port, or installing anything beyond the page they already have. The peer-to-peer plumbing is a PHP relay that long-polls between the two browsers: the host keeps a /poll connection open, the guest's iframe sends requests through /relay/<sid>/request/..., the relay shuttles them to the host, the host processes them through its own in-browser PHP-WASM, and the response comes back the same way.

The relay is just one self-contained PHP file so it can drop into WP.com Atomic, a static host with PHP, or anywhere PHP runs. There is no Node, no database, no bespoke server.

What's in this PR

The relay itself. packages/playground/website/public/relay.php implements the full wire protocol: session creation, host long-poll, guest request tunneling, response delivery, status / heartbeat, and explicit close. Sessions live under a per-system temp directory (never under the public web root). Concurrent dispatch is flock()-protected so two pollers can't double-deliver the same request. Hosts that stop polling are marked dead within ~40s, in-flight guest requests fail fast instead of hanging for 30s, and a guest opening the share link a few ms ahead of the host's first poll waits briefly instead of getting an immediate 503.

A prominent Share button. Previously buried inside Site Manager → Additional actions, it now lives directly in the main toolbar where people can actually see it.

A live collaborator list. Each guest tab generates a stable UUID, sends it on every status heartbeat, and gets a sticky "Guest 1", "Guest 2", … label. The host's modal polls /status every 3s and shows the live count plus labels; guests are pruned ~10s after they stop checking in.

A host-disconnected overlay on the guest. When the host stops sharing — clicked Stop Sharing, closed the tab, or just walked away — the guest's banner flips from "● Connected" to "● Host disconnected" and a frozen-iframe overlay explains what happened, instead of the iframe sitting on a stale request that times out after 30s.

One relay code path everywhere. Earlier iterations of this PR ran an in-memory TypeScript relay middleware in dev and relay.php only in production, which is exactly the "works in dev, broken in prod" arrangement we want to avoid. Dev now spawns php -S 127.0.0.1:5264 relay.php automatically alongside the vite server (see the dev:relay-php nx target) and proxies /relay/* to it. Same code, same wire protocol, same failure modes — there is no other implementation to drift from.

End-to-end tests. packages/playground/website/playwright/e2e/sharing.spec.ts covers opening the modal from both the toolbar and the dropdown, starting and stopping a share, copying the share URL, the single-guest happy path through the relay, the multi-guest collaborator list growing from 0 → 1 → 2 and shrinking back to 1 when a guest closes its tab, and the host-disconnected overlay appearing on the guest after the host stops sharing. Each multi-tab test opens its guests in isolated BrowserContexts and drives the heartbeat by hand from the test (see pingGuestHeartbeat) — same-context tabs in headless Chromium would otherwise compete for active-tab focus and starve each other's setInterval.

Testing Instructions

npm run dev

Then in two browser tabs:

  1. Open http://localhost:5400/website-server/. Wait for WordPress to load. Click Share in the toolbar, then Start Sharing, and copy the link.
  2. Open the share link in a second tab. The guest should connect within a few seconds and render the same WordPress site, with the host's admin bar visible.
  3. Back in the host tab, the modal should now say "1 collaborator connected" and show "Guest 1".
  4. Open the share link in a third tab. Host modal flips to "2 collaborators connected" with both Guest 1 and Guest 2.
  5. Close the third tab. Host modal should drop back to "1 collaborator connected" within ~10s.
  6. Click Stop Sharing on the host. The remaining guest should immediately flip to "● Host disconnected" with a frozen-iframe overlay.

To run the automated suite:

npx nx run playground-website:e2e:playwright -- --project=chromium sharing.spec.ts

Possible follow-ups

  • Direct WebRTC peer-to-peer so the relay only carries the initial handshake.
  • Letting the guest keep using the shared Playground after the host disappears.
  • Read-only mode for guests.
  • Surfacing relay errors in the host modal instead of only on the guest.

adamziel added 10 commits April 7, 2026 12:26
Enables sharing a Playground instance with others through an HTTP long-polling relay. The host browser processes WordPress requests and sends responses back through the relay to guest browsers.

Key components:
- Relay middleware for Vite dev server
- TunnelHost class for host-side request processing
- SharedPlaygroundViewer for guest-side rendering
- Share modal UI with copy-to-clipboard functionality
- URL rewriting for HTML, CSS, and redirect headers
The sharing feature now persists when the modal is closed, allowing
users to share their Playground in the background. A status indicator
in the toolbar shows when sharing is active. Clicking it reopens the
share modal for management.

Also adds a PHP implementation of the relay server for production
deployments where Node.js isn't available. The JavaScript relay
continues to be used for development.
The php-relay-middleware and relay-middleware files use Node.js modules
(fs) and should not be exported from the client-facing barrel file.
This was causing "fs.readFileSync" errors in the browser.
When a host long-poll times out, the middleware tried to remove its
resolver from the pollResolvers array by searching for the request
Promise via indexOf. The array actually holds resolver functions, so
the lookup never matched and stale entries piled up. A later guest
request would then shift() a dead resolver instead of the live one,
silently dropping the request until it hit the 30s timeout and bubbled
up as a 504 Gateway. Keep a direct reference to the resolver function
and use that for the cleanup so the array stays accurate.
Closing the host tab used to leave the guest staring at a live-looking
shell that would silently time out on the next click, surfacing a raw
"Gateway timeout" JSON blob thirty seconds later. Now the relay tracks
when the host last polled, the host fires a sendBeacon to an explicit
/close endpoint on pagehide, and the guest polls a small /status
endpoint and drops a friendly "Host disconnected" overlay as soon as
the session goes cold.
The Share action used to live three clicks deep inside the Site Manager
site-info panel, where nobody was likely to find it. Promote it to a
primary button in the browser-chrome toolbar so a first-time visitor
sees it right next to Save and the site switcher.

The service worker also starts letting /relay/* traffic pass through to
the network instead of trying to serve it from the cache, so the host
can actually open a sharing session from the same tab.
When a host clicks Share they are mostly flying blind — there is no
way to tell whether anyone actually joined. This turns the share modal
into a live collaborator panel: the relay tracks each guest tab by a
stable UUID it heartbeats on every /status poll, and the host re-polls
that same endpoint to render a pill for every guest (anonymous 'Guest
1', 'Guest 2' labels are plenty for now). Guests that go quiet for
more than ten seconds drop off on their own, so closing a tab visibly
shrinks the list.
Two things broke when the PHP relay became the only relay path in dev
mode. First, the host's TunnelHost kicks off /poll in the background
and returns from startSharing immediately, so a guest opening the
share link a few milliseconds later races the host's first poll and
the guest's /request/ landed on the relay before hostConnected was
true. With the in-process TS middleware everything was synchronous
and the race was invisible; the file-based PHP relay loses it
routinely. Wait briefly for the host to show up before bailing.

Second, vite's proxy uses changeOrigin and rewrites the Host header
to the relay's own port, so when the host's TunnelHost rewrites
absolute WordPress URLs in the response HTML it misses every one of
them and the iframe loads a broken page. Forward X-Forwarded-Host
through as the Host header in the tunnel request when present.

While we're here, parallel playwright workers were starving the PHP
CLI server's pool because each long-poll holds its worker for 25s.
Bump PHP_CLI_SERVER_WORKERS to 20 so 3 simultaneous tests have room
to breathe.
php-relay-middleware.ts was an in-vite bridge that ran relay.php under
a single shared PHP-WASM instance. Now that the dev server proxies
/relay/* straight to a real php -S running in its own process, that
bridge is dead code — and it couldn't have served the long-polling
relay anyway, since one in-flight host poll would block every other
request through the shared instance.

The barrel re-exports of QueuedRequest and TunnelSession went with
it: those are server-side session-state types that nothing on the
client side ever touched.
@adamziel adamziel force-pushed the add-share-button-php-relay branch from f30170e to c95c53b Compare April 7, 2026 17:45
adamziel added 13 commits April 7, 2026 20:38
The website's lint job runs with maxWarnings=0 and was failing on
leftover diagnostic console.log calls from when the relay was being
debugged interactively, on parameter properties (which the project
forbids because Node.js type stripping can't handle them), on a
couple of `Function` types in the TunnelHost listener bag, and on a
handful of small things — an `import()` type annotation in the
sharing test, an unused catch binding, a `let` that should have been
a `const`. Routine cleanup, no behavior changes.
The Playwright e2e config that CI uses spins up a static `vite
preview` server with the cors-proxy next to it, but no PHP relay —
so every share test that gets past the modal step (start-sharing,
stop-sharing, copy-to-clipboard, the multi-tab flows) was failing
because /relay/* hit the static preview server and got back HTML
instead of JSON. The CI orchestration script now boots a real
`php -S` for the relay alongside the cors-proxy, the vite preview
block proxies /relay/* through to it, and the relay advertises
share URLs at the right base for whichever context it's running in
(127.0.0.1 in CI, 127.0.0.1:5400/website-server/ in dev).
Two webkit-specific things tripped on the CI matrix that don't show
up locally on chromium.

The clipboard tests were trying to grant clipboard-write, which
doesn't exist in webkit's permission table. The "should start
sharing" test wasn't even using the clipboard, so the grant just
gets dropped there. The "should copy" test legitimately needs
clipboard-read to verify the copied URL, so it now asks for only
clipboard-read on webkit (which is what webkit accepts) and the
React handler's writeText() still works because it runs from a
real user gesture.

The two tests that open a guest tab with `context.newPage()` —
"should allow guest to view host playground" and the host-
disconnected overlay test — were deterministically failing on
webkit because same-context tabs in headless webkit compete for
focus and starve each other's poll loops. The earlier multi-guest
test already worked around this by giving each guest its own
BrowserContext; the same fix applies here.
navigator.clipboard.writeText is the modern path but webkit's
headless mode (and firefox in some configurations) rejects it with
NotAllowedError even after the right permission is granted. The
share modal now falls back to a hidden textarea + execCommand('copy')
when writeText is unavailable, and flips the Copy button to "Copied!"
either way so the click always feels responsive.

The e2e test stops trying to read the OS clipboard everywhere — it
now asserts on the user-visible "Copied!" label, which works in all
three browsers, and only on chromium (where Playwright's clipboard
permission grant is honored end to end) does it additionally read
the clipboard back to confirm the URL was actually placed there.
The relay's session, request and response state used to live as JSON
files under DATA_DIR with flock() handling concurrent access. That
works fine for single-host setups (dev, Atomic, anywhere every PHP
worker shares a disk) but it's not portable to multi-host
deployments where workers can't see each other's filesystems.

Pull all the storage operations behind a small RelayStorage interface
with two interchangeable backends: the existing flock-protected
FileRelayStorage (still the default, so an out-of-the-box checkout
keeps working without any database setup), and a new MysqlRelayStorage
that runs the same operations on InnoDB tables. The atomicity
guarantees are equivalent — flock(LOCK_EX) on the session file maps
to SELECT ... FOR UPDATE inside a short transaction, and the
non-blocking try-lock that prevents two pollers from grabbing the
same request maps to the same FOR UPDATE pattern on a single-row
SELECT against the requests table.

Pick a backend with the PLAYGROUND_RELAY_BACKEND env var. The MySQL
class reads its credentials from the standard WordPress DB_HOST,
DB_USER, DB_PASSWORD, DB_NAME and DB_PORT constants when defined —
so it can drop into a wp-config environment with zero extra wiring —
and falls back to env vars of the same name otherwise. The schema
is created lazily on first connect via CREATE TABLE IF NOT EXISTS.
Stand a real MySQL service container up alongside the playwright
runner, install pdo_mysql, wait for the server to come up, and
hand PLAYGROUND_RELAY_BACKEND=mysql plus the DB_* credentials to
the playwright subprocess so the relay's mysql storage class is
the one exercised end-to-end by sharing.spec.ts. The variables go
through on the sudo command line rather than via `sudo -E`
because Ubuntu's default sudoers policy resets the environment.

The file backend keeps its local round-trip smoke test, but every
share test that runs in CI now drives the mysql code path —
session create, withSession's SELECT ... FOR UPDATE, the
claimNextRequest dispatch race, and the cleanup query.
Adds an end-to-end test that does the thing the share feature
is supposed to do: the host edits a post in its in-browser
WordPress while a guest is connected through the relay, then
the guest navigates to that post and sees the new title. The
update goes through window.playgroundSites.getClient().run(),
the same path a real collaborative tool would use to mutate
host state, and the verification is a fresh navigation through
the relay tunnel — so we're catching anything that could break
live propagation, not just initial page delivery.

While editing the relay itself, two small things that didn't
sit right:

The MySQL backend used to fall back to localhost / root / empty
password / "playground_relay" if the credentials weren't set,
which is the kind of "helpful" default that hides
misconfiguration until something silently connects to the wrong
database. It now refuses to start without DB_HOST, DB_USER,
DB_PASSWORD and DB_NAME and tells the operator which one is
missing. DB_PORT still defaults to 3306 because that's the
universal MySQL port and not really a credential.

The session timeout was 30 minutes, which made no sense once
HOST_DEAD_AFTER_MS detected silent hosts in 40 seconds and
guests flipped to the disconnect overlay seconds after that.
Sessions only need to survive long enough for guests to render
the right UI — five minutes is comfortably more than that and
short enough that abandoned sessions stop piling up.
`npm run dev` boots five processes in parallel and the website
server (port 5400) used to start with a fixed `sleep 1` ahead of
it. That's not actually a readiness check — it's a guess — and
when the remote dev server (port 4400) is slow to bind, the very
first request the browser makes hits the website's vite proxy,
which forwards everything that isn't /website-server or /relay
to the remote, gets ECONNREFUSED, and prints a confusing
"http proxy error: /manifest.json" line before everything self-
heals on the next request.

Replace the sleep with a tiny portable port wait that polls
127.0.0.1:4400 until it accepts a TCP connection or times out at
30s. The dev:standalone target only starts once the remote is
genuinely ready, so the first navigation no longer races startup.
Two sources of noise: the PHP built-in server prints a "Development
Server started" banner per worker — twenty-one lines at startup with
PHP_CLI_SERVER_WORKERS=20 — plus an "Accepted"/"Closing" pair on
every request and a periodic "Failed to poll event" warning. And
when the relay is briefly unreachable (mid-restart, killed worker,
whatever), vite logs an "http proxy error" stack trace once per
guest poll — every three seconds, forever, for as long as the
browser tab stays open.

Wrap `php -S relay.php` in a small node helper that filters the
known-noise lines off stderr and forwards everything else, so real
PHP errors and our own error_log() output still surface. Same
wrapper for the dev:relay-php and preview:relay-php targets so CI
benefits too.

For the proxy noise, the relay proxy block in vite.config.ts now
has its own error handler that returns a clean 502 to the client,
and a custom logger filters the matching "http proxy error" line
out of vite's terminal output. Other proxy errors still log
normally.
The guest viewer used setInterval to drive its /status?gid= polling
loop, recomputed the request URL on every render, and depended on
those URLs in two separate useEffects. The combination meant that
every state update — flipping to "connected", an error message,
anything — produced new string references for relayBaseUrl and
statusUrl, both effects tore down and re-ran, fired a brand-new
fetch immediately, and the previous in-flight fetch was only
"logically" cancelled via a closure flag while the network request
kept running in the background. On a fresh share-URL load this
piled up several /status requests that the JS would never wait
for, the guest stayed in "connecting" forever, and only a manual
page refresh broke out of it.

Replace the two effects with a single self-scheduling loop:

- relayBaseUrl and statusUrl are now memoised so their references
  are stable across re-renders and the effect only re-runs when
  sessionId or guestId actually changes.
- A shared AbortController cancels the in-flight fetch the moment
  the component unmounts, instead of leaving it to the network.
- The polling rhythm is "fetch → wait for the response → setTimeout
  the next call" so two /status requests can never overlap.
- The initial /request/ probe and the /status loop share the same
  cancellation, the same controller, and the same sawHostAlive
  state — the previous code reset that flag on every effect re-run,
  which is also what made the host-disconnected detection brittle
  in the first place.
The host's polling loop was firing handleRequest() concurrently for every
request it claimed from the relay, ignoring the requestQueue/processQueue
machinery sitting right next to it. PHP-Wasm in the host iframe is single-
threaded and not reentrant, so as soon as a guest opened a share URL and
the iframe fanned out a dozen sub-resource fetches, the host deadlocked
and every request 504'd. The visible symptom was a guest stuck on
"Connecting...".

Route the polled request through queueRequest() so handlers run one at a
time, which is what the existing queue was always meant to do.
The previous URL rewriter swept the response body with a handful of
regexes that could lose attributes the moment a perfectly legal HTML
construct showed up — a `>` inside a title attribute, an unquoted
src, a comment containing a fake tag, a URL string sitting inside a
<script> body. None of those are exotic; WordPress and its themes
emit them every day. Worse, the regexes happily rewrote URL-shaped
substrings inside JS literals, silently corrupting the script.

Use a real HTML parser instead. The host runs in a browser tab so
DOMParser is always there; the unit test runs under jsdom so the
parser shape is the same in both environments. The new module
classifies every URL through one isRewritableUrl() gate so href,
srcset, inline style, and standalone CSS all answer the same
question the same way: leave anchors, protocol-relative, data:,
javascript:, mailto:, tel:, third-party, and already-relayed URLs
strictly alone.

The accompanying spec is intentionally adversarial. Every case is
there because at least one obvious regex approach gets it wrong —
keep it that way the next time someone is tempted to "simplify"
this back into a one-liner.
Two follow-ups to the queue fix that the security review on the
queue approach surfaced.

Stop Sharing used to be a soft suggestion. If a guest request was
already mid-flight when the user clicked stop, the in-flight
handleRequest would cheerfully complete its PHP run, build a
response, and POST it to /relay/null/response/... — emitting a
misleading error event from a session the user had already torn
down. Worse, since the host's WordPress is logged in as admin, a
guest write request landing 50 ms before the click could still
mutate the host's filesystem after the user thought they had cut
the connection.

We can't actually cancel a PHP request once it's running in the
worker, but we can refuse to forward its result. Each request now
gets its own AbortController, stopSharing() trips it, every await
checkpoint in handleRequest() and sendResponse() bails on a torn-
down session, and we double-check the session id matches the one
we started with so a fast Stop → Start cycle can't accidentally
deliver an old guest's response into a new session.

The polling loop also used to drain the relay as fast as it could,
appending to an unbounded in-memory queue. A misbehaving guest (or
just a slow PHP run) could grow that queue without limit. Cap it
at 32 entries and pause polling while it's full so the relay's
long-poll keeps the next request waiting on its side instead of
piling bytes into our RAM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant