Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions spec/std/io/io_spec.cr
Original file line number Diff line number Diff line change
Expand Up @@ -796,6 +796,19 @@ describe IO do
io.encoding.should eq("UTF-8")
end

it "handles long lines correctly with invalid: :skip" do
# Using both ASCII characters and a 26-byte Unicode characters to
# ensure we hit as many byte boundaries inside the Unicode characters
# as we can to get sufficient confidence in this test.
text = "test string 👩🏾‍🤝‍👨🏻" * 10240
Comment on lines +800 to +803
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we really need specific single-/multi-byte characters at all to test this properly?
The original example only uses single-byte characters to reproduce the bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how iconv works and how familiar someone is with it. I don’t know anything at all about it, so I needed a test case that gives me sufficient confidence that the behavior introduced in this PR doesn’t count multi-byte characters that cross the 1024-byte boundary (for example: starts at byte 1022 and ends at byte 1030) as invalid.

I have no idea how iconv handles that scenario (it may very well protect against it, but again, I don’t know) and this test case shows that it handles it as expected. Without the multi-byte character, I couldn’t say for sure. Since I didn’t find any tests that exercised this scenario and it was easy to test, I added it in.

io = IO::Memory.new
io.set_encoding "UTF-8", invalid: :skip
io << text

io.bytesize.should eq text.bytesize
io.to_slice.should eq text.to_slice
end

it "does skips when converting to UTF-8" do
io = SimpleIOMemory.new(Base64.decode_string("ey8qx+Tl8fwg7+Dw4Ozl8vD7IOLo5+jy4CovfQ=="))
io.set_encoding("UTF-8", invalid: :skip)
Expand Down
13 changes: 11 additions & 2 deletions src/crystal/iconv.cr
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,19 @@ struct Crystal::Iconv
def convert(inbuf : UInt8**, inbytesleft : LibC::SizeT*, outbuf : UInt8**, outbytesleft : LibC::SizeT*)
{% if flag?(:freebsd) || flag?(:dragonfly) %}
if @skip_invalid
return LibC.__iconv(@iconv, inbuf, inbytesleft, outbuf, outbytesleft, LibC::ICONV_F_HIDE_INVALID, out invalids)
err = LibC.__iconv(@iconv, inbuf, inbytesleft, outbuf, outbytesleft, LibC::ICONV_F_HIDE_INVALID, out invalids)
if err == Crystal::Iconv::ERROR && Errno.value != Errno::E2BIG
return err
else
return
end
end
{% end %}
{{ USE_LIBICONV ? LibIconv : LibC }}.iconv(@iconv, inbuf, inbytesleft, outbuf, outbytesleft)

err = {{ USE_LIBICONV ? LibIconv : LibC }}.iconv(@iconv, inbuf, inbytesleft, outbuf, outbytesleft)
if err == Crystal::Iconv::ERROR && Errno.value != Errno::E2BIG
return err
end
end

def handle_invalid(inbuf, inbytesleft)
Expand Down
Loading