Skip to content

Follow url metadata redirects#133

Open
femincan wants to merge 26 commits intoplayfulprogramming:mainfrom
femincan:feat/url-metadata-follow-redirects
Open

Follow url metadata redirects#133
femincan wants to merge 26 commits intoplayfulprogramming:mainfrom
femincan:feat/url-metadata-follow-redirects

Conversation

@femincan
Copy link
Copy Markdown
Member

@femincan femincan commented Mar 26, 2026

Closes #34

The reason why I didn't (and couldn't) use the onInfo method is that it only runs the callback for responses with 1xx status code.

onInfo ({statusCode: number, headers: Record<string, string | string[]>}) => void | null (optional) - Default: null - Callback collecting all the info headers (HTTP 100-199) received.

Todo:

  • Implement redirection for fetchAsBotStream function
  • Update the outdated tests according to the new implementation
  • Add tests for the function
  • Add tests related to redirection for both fetchAsBot and fetchAsBotStream

Unrelated Todo:

  • The net connection can be totally disabled (disableNetConnect) for mocks to make sure there is no real request to the net.
  • The mock agent can be reset before every test to prevent side effects between tests.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added streaming API for bot request handling, enabling efficient large response processing.
    • Enhanced redirect handling with support for additional HTTP status codes and improved location parsing.
    • Extended HTTP method support to include PUT and DELETE operations.
  • Improvements

    • Improved error messages for invalid redirect scenarios and better robots.txt compliance validation.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de67437e-b975-4a0f-8324-8862e8850900

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@femincan
Copy link
Copy Markdown
Member Author

femincan commented Mar 28, 2026

So as you suggested, I used stream factory to detect redirects and repeat the stream with the redirected url along the error-detection. I have some pin points and questions.

Pin points:

  • I extended the redirection status codes to include 303, 307, and 308 alongside 301 and 302 to cover more common redirection cases.
  • I moved the log for the redirections to the inside of the loop to log only if the redirect location is exist and valid.

Questions:

  • I used the createWriteStream(devNull) directly inside the factory function. Does this mean creating a closure as well maybe with the module level? If This is the case I can create a property placeholderStream inside the opaque object.
  • What is the difference between createWriteStream(devNull) and new Writable(write(_chunk, _encoding, next) {next()})? Is using OS null directory more efficient or because it is more concise?
  • Currently we are not returning any error for redirected urls which doesn’t have a valid location and just ignoring them by returning the writable. Is this the expected behavior?

@fennifith
Copy link
Copy Markdown
Member

I used the createWriteStream(devNull) directly inside the factory function. Does this mean creating a closure as well maybe with the module level? If This is the case I can create a property placeholderStream inside the opaque object.

Nope, it's only a closure if it references a local variable inside of the function it's declared within. You could ensure it doesn't create a closure by moving it to a top-level function and passing it in as a value.

What is the difference between createWriteStream(devNull) and new Writable(write(_chunk, _encoding, next) {next()})? Is using OS null directory more efficient or because it is more concise?

I have no idea! I don't think it would make a big difference which one is used TBH

Currently we are not returning any error for redirected urls which doesn’t have a valid location and just ignoring them by returning the writable. Is this the expected behavior?

Yeah, it would be good to return an error if the redirect can't be resolved (the same as we return an error for a non-200 status)

@femincan femincan marked this pull request as ready for review April 9, 2026 11:21
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/worker/src/utils/fetchAsBot.ts (1)

146-192: Consider using a shared no-op Writable instead of createWriteStream(devNull).

Each call to createWriteStream(devNull) opens a new file descriptor. While the FDs are closed when the stream ends and redirects are bounded by followRedirects, a singleton no-op Writable would avoid the overhead entirely.

♻️ Optional: Use a shared no-op Writable
+import { Writable } from "stream";
+
+const noopWritable = new Writable({
+  write(_chunk, _encoding, callback) {
+    callback();
+  },
+});
+
 const fetchAsBotStreamFactory: Dispatcher.StreamFactory<
   FetchAsBotStreamFactoryOpaque
 > = ({ opaque, statusCode, headers }) => {
   // ...
-      return createWriteStream(devNull);
+      return noopWritable;
   // ... (apply to all three return statements)
 };

Note: This is a minor optimization and the current implementation is functionally correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/worker/src/utils/fetchAsBot.ts` around lines 146 - 192, The stream
factory fetchAsBotStreamFactory currently calls createWriteStream(devNull)
multiple times (opening FDs) when handling redirects or non-2xx statuses;
replace those repeated calls with a single shared no-op Writable singleton
(e.g., a module-scoped DevNullWritable) and return that instance instead of
calling createWriteStream(devNull) so you avoid opening multiple file
descriptors; keep existing behavior for setting opaque.error and opaque.redirect
and ensure the singleton is used wherever createWriteStream(devNull) currently
appears in fetchAsBotStreamFactory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/worker/src/utils/fetchAsBot.ts`:
- Around line 146-192: The stream factory fetchAsBotStreamFactory currently
calls createWriteStream(devNull) multiple times (opening FDs) when handling
redirects or non-2xx statuses; replace those repeated calls with a single shared
no-op Writable singleton (e.g., a module-scoped DevNullWritable) and return that
instance instead of calling createWriteStream(devNull) so you avoid opening
multiple file descriptors; keep existing behavior for setting opaque.error and
opaque.redirect and ensure the singleton is used wherever
createWriteStream(devNull) currently appears in fetchAsBotStreamFactory.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e2dee141-b1c2-4fe6-b848-30caa48a378d

📥 Commits

Reviewing files that changed from the base of the PR and between 66ba7e1 and d644702.

📒 Files selected for processing (3)
  • apps/worker/src/utils/fetchAsBot.test.ts
  • apps/worker/src/utils/fetchAsBot.ts
  • apps/worker/test-utils/server.ts

@femincan femincan requested a review from fennifith April 9, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modify url-metadata to follow redirects

2 participants