Fix `IO::Encoder#write` when operating on long strings by jgaskins · Pull Request #16797 · crystal-lang/crystal

jgaskins · 2026-03-30T04:30:20Z

This PR fixes encoding operations on long strings by ignoring Errno::E2BIG errors.

FWIW, this will need additional testing to ensure that strings with non-ASCII characters still work, but the test included with this PR fails without this patch and passes with it.

Fixes #16796

We accomplish this by ignoring `Errno::E2BIG` errors.

crysbot · 2026-03-30T04:31:45Z

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/data-loss-when-writing-long-lines-with-file-print-after-file-set-encoding/8830/4

We need to test single-byte ASCII characters but also multibyte Unicode characters to ensure we're encoding the characters correctly when that multibyte character lands on a buffer boundary.

ysbaddaden

The error should be handled by Crystal::Iconv#convert directly. That would fix all usages at once (IO::Encoding, String.encode, ...).

spec/std/io/io_spec.cr

Sija · 2026-04-02T19:11:42Z

src/io/encoding.cr

+        if err == Crystal::Iconv::ERROR
          @iconv.handle_invalid(pointerof(inbuf_ptr), pointerof(inbytesleft))
        end


Shouldn't this path be handled by Iconv#convert as well?

This is a great question. Seems like that would address the concern raised here.

But, to be clear, I'm not confident enough to make that decision. There may be a reason it's handled there that I don't have context for.

Good point. Every call site does just the same. That smells like copy-paste. Since the #convert method already handles invalids on FreeBSD and DragonflyBSD, let's encapsulate the whole behavior into #convert 👍

straight-shoota · 2026-04-07T15:06:53Z

spec/std/io/io_spec.cr

+          # Using both ASCII characters and a 26-byte Unicode characters to
+          # ensure we hit as many byte boundaries inside the Unicode characters
+          # as we can to get sufficient confidence in this test.
+          text = "test string 👩🏾‍🤝‍👨🏻" * 10240


question: Do we really need specific single-/multi-byte characters at all to test this properly?
The original example only uses single-byte characters to reproduce the bug.

It depends on how iconv works and how familiar someone is with it. I don’t know anything at all about it, so I needed a test case that gives me sufficient confidence that the behavior introduced in this PR doesn’t count multi-byte characters that cross the 1024-byte boundary (for example: starts at byte 1022 and ends at byte 1030) as invalid.

I have no idea how iconv handles that scenario (it may very well protect against it, but again, I don’t know) and this test case shows that it handles it as expected. Without the multi-byte character, I couldn’t say for sure. Since I didn’t find any tests that exercised this scenario and it was easy to test, I added it in.

Fix IO::Encoder when operating on long strings

1862090

We accomplish this by ignoring `Errno::E2BIG` errors.

ysbaddaden added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:text labels Mar 30, 2026

Use complex Unicode characters in the test string

05a8636

We need to test single-byte ASCII characters but also multibyte Unicode characters to ensure we're encoding the characters correctly when that multibyte character lands on a buffer boundary.

ysbaddaden requested changes Apr 2, 2026

View reviewed changes

ysbaddaden added this to the 1.20.0 milestone Apr 2, 2026

jgaskins added 2 commits April 2, 2026 13:33

Move error handling to Crystal::Iconv#convert

bc8c466

Merge branch 'master' into fix-io-encoder-with-long-strings

00b4d3e

Sija reviewed Apr 2, 2026

View reviewed changes

jgaskins added 2 commits April 4, 2026 00:14

Handle iconv errors inside of the Iconv wrapper

91d011d

Fix double call of handle_invalid in String

2eb7781

jgaskins force-pushed the fix-io-encoder-with-long-strings branch from 3463c2f to 2eb7781 Compare April 4, 2026 17:09

straight-shoota reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `IO::Encoder#write` when operating on long strings#16797

Fix `IO::Encoder#write` when operating on long strings#16797
jgaskins wants to merge 6 commits intocrystal-lang:masterfrom
jgaskins:fix-io-encoder-with-long-strings

jgaskins commented Mar 30, 2026

Uh oh!

crysbot commented Mar 30, 2026

Uh oh!

ysbaddaden left a comment

Uh oh!

Uh oh!

Sija Apr 2, 2026

Uh oh!

jgaskins Apr 2, 2026

Uh oh!

jgaskins Apr 2, 2026

Uh oh!

ysbaddaden Apr 3, 2026

Uh oh!

straight-shoota Apr 7, 2026

Uh oh!

jgaskins Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

jgaskins commented Mar 30, 2026

Uh oh!

crysbot commented Mar 30, 2026

Uh oh!

ysbaddaden left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Sija Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

jgaskins Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

jgaskins Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

ysbaddaden Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

straight-shoota Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

jgaskins Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants